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Abstract 


Static type systems are traditionally used to prevent run-time type-errors in user programs 
and to assign appropriate storage representations to objects during compilation. In this thesis, 
we explore some new ways of using static type information in the design, compilation, and 
execution of programs written in a strongly-typed, polymorphic language. 


Programmers often find it useful to know whether or not a particular data-structure may be 
updated outside a given control block. Information about an object’s non-mutability helps 
compiler optimizations, improves aliasing and dependence analyses, and permits unrestricted 
caching of functional data at run-time. In the first part of this thesis, we present a safe, static 
mechanism for functional encapsulation of imperative data-structures using a powerful type 
system based on closure types and regions. We introduce a new language construct called 
close which delimits the scope of side-effects on imperative objects and converts them into 
functional objects outside that scope. This mechanism may be used to build efficient, high- 
level, functional data-abstractions within a language using its low-level, imperative constructs. 
Type-safety and non-mutability of closed objects is guaranteed by a semantic soundness theorem 
that ensures consistency between the static and the dynamic semantics. The type system is 
presented in the context of Id, which is a strongly-typed, polymorphic, higher-order language, 
and it easily simplifies to a first-order, monomorphic language such as C or Fortran. 


In the second part of the thesis, we develop a general, compiler-directed methodology for com- 
plete type reconstruction of run-time objects in a polymorphic language without using any 
run-time type-tags. Run-time type reconstruction is carried out by instantiating static type 
information for each function activation frame present within the dynamic call tree. Additional 
type-hints are inserted automatically at compile-time and are decoded at run-time to ensure 
complete type reconstruction. We present the necessary compiler analysis and the type re- 
construction algorithm and prove their correctness. This technique has been used successfully 
for displaying run-time objects within the Id source debugger for Monsoon and to perform 
tagless garbage collection in the *T architecture. We describe the latter application in detail, 
comparing its performance with other schemes for automatic storage reclamation. 


Key Words and Phrases: Functional Encapsulation, Operational Semantics, Polymorphism, 
Type Soundness, Imperative Typing, Closure Typing, Regions, Typed Run-time System, Type 
Reconstruction, Type Conservation, Type-Hints, Tagless Garbage Collection. 
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Chapter 1 


Introduction 


One of the main goals of modern, high-level programming languages is to provide an intuitive 
programming model that is useful in writing applications and reasoning about their behav- 
ior. Some languages enforce a style of programming that guarantees useful properties for the 
programs written in those languages. In this thesis, we concentrate on a class of high-level 
languages that are strongly-typed. Strong-typing enforces type-consistency which imparts a de- 
gree of robustness to the program. A type-consistent program is guaranteed never to run into 
a run-time type-error, e.g., attempting to use an integer in a floating-point computation or 
applying a non-function object to an argument. 

In a strongly-typed language, type-consistency can be enforced during compilation (static 
typing) or during execution (dynamic typing). The compiler for a statically-typed language has 
to be somewhat conservative in enforcing type-consistency: it may reject certain programs that 
appear to be inconsistent, although such programs may not encounter a run-time type-error for 
certain inputs (or even for all possible inputs). The advantage of being conservative is that a 
program that has been statically determined to be type-consistent, is guaranteed to execute in 
a type-consistent manner for all inputs. Therefore, no checks for type-consistency need to be 
made during its execution. 

Type information is primarily used in statically-typed languages to check for type-consistency 
within programs and to choose memory representations for data-structures. Most of this in- 
formation is thrown away once a program has been compiled. At most, some type information 
may be saved in a symbol table to be used by a source-level debugger. In this thesis, we wish 
to explore more fundamental ways in which to incorporate and use type information in the 
design, compilation, and execution of a program written in a strongly-typed language. We wish 
to use the type system of our language as a tool for structuring the language design into tight 
abstraction layers, provide support for compiler optimizations and automatic code generation 
as well as support for run-time facilities such as source-level debugging and garbage collection. 


1.1 Layered Language Design 


Modern, high-level languages offer a variety of data and control abstraction mechanisms to 
enable users to structure their applications properly. Most programming language designs 
fall into one of the following two categories: either a language includes a large repertoire of 
common datatypes and their manipulation functions as part of its definition as in Common 
Lisp [SJ90], or these objects are defined separately in a standard prelude or in system and user 
libraries as in the case of Standard ML [MT91, MTH90], Haskell [HWe90], or C [Pla92]. The 
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first approach sometimes leads to language definitions that may be too large to understand, 
implement and reason about. The second approach usually leads to small and simple language 
kernels that may be used to “implement” high-level datatypes and their associated functions 
as independent libraries. This approach seems better in terms of overall ease of understanding 
and maintainability of the language, though it requires a careful design of the libraries and their 
user interface. 

The recent success of strongly-typed, polymorphic, functional languages, such as Haskell 
and Standard ML, highlight the importance of this layered approach to language design. Small 
language kernels have manageable semantic complexity and can be subjected to powerful rea- 
soning techniques. At the same time, a small set of kernel primitives that can be suitably 
mapped to the underlying architecture provide a flexible and efficient means of implementing 
pre-defined and user-defined high-level datatypes. In order for such kernel implementations to 
be sound and transparent to the end-user, a proper data and type abstraction mechanism must 
be provided in the kernel language. Otherwise, the semantic correctness of the implementation 
may be in doubt. An example of this situation is the C language [KR88] which offers complete 
flexibility of a low-level kernel language but lacks a tight abstraction mechanism, leading some- 
times to subtle errors in user programs. For this reason, many high-level languages offer only a 
fixed set of high-level constructs with pre-defined semantics rather than provide the user with 
the complete flexibility and the raw power of a low-level kernel language. The list comprehen- 
sion mechanism, first introduced in NPL [Bur77, Dar77] and later adopted in Miranda ['Tur85] 
and Haskell, is an example of such a language construct. 

From a language design standpoint, a powerful type system can be used to enforce the type 
abstraction desired for kernel language implementations of high-level datatypes in libraries 
without changing the high-level language definition or modifying the compiler. In Part I of this 
thesis, we are going to present a type system that will allow us to build a data-structure in a 
low-level, imperative style and then safely encapsulate it as a functional data-structure. The 
motivation for doing so is as follows. 

First, our type system reduces the complexity of writing compilers for functional languages. 
Functional syntactic constructs, such as list and array comprehensions, that have to be im- 
plemented within the compiler as primitive constructs, can instead be desugared into ordinary 
functions that are implemented in an independent system library. This is possible because we 
allow the programmer to use low-level imperative constructs while implementing the library 
that are safely encapsulated within functional abstractions provided by the type system. This 
approach is also very flexible since it allows modification and extension of existing language 
constructs as well as addition of new constructs without disturbing the bulk of the compiler. 

Second, our type system provides a way to safely implement functional computations using 
imperative algorithms that cannot otherwise be expressed in a functional style efficiently. No- 
table examples that have this characteristic are accumulation (histogramming) algorithms and 
eraph algorithms. Although, the final result may be functional, the computation often needs to 
be performed in an imperative way in order to achieve efficiency in space and time. Using our 
type system, an imperative computation can be safely embedded within a functional program 
while still preserving its clean semantics and simple reasoning. 

From the standpoint of a compiler, working hand-in-hand with a powerful type system can 
prove to be more fruitful than working around it, as most compilers tend to do. Static types of 
program fragments provide valuable information about “what” is being computed. The shape 
and size of data-structures and the input/output parameters of functions can be determined 
using their static types. Intelligent compilers can use this information while performing impor- 
tant optimizations such as boxing/unboxing of data, code specialization, and register allocation. 


16 


Unfortunately, very few compilers actually propagate the full source type information all the 
way to the back-end, the Glasgow Haskell compiler [PJ92] being a notable exception. In a lay- 
ered language, this task is considerably simplified since only a small number of kernel language 
constructs are involved within the later phases of the compiler. 

It is also possible to use source type information at run-time to display objects during ex- 
ecution, or to output them to a file, or to perform garbage collection. A run-time system that 
has access to complete source type information from the compiler may not need to maintain 
such information independently, say in the form of object type-tags, in order to handle such 
applications. The compiler and the run-time system could be made to cooperate in automat- 
ically recreating and using this type information when needed. In Part II of this thesis, we 
will explore the technique of run-time type reconstruction that reconstructs the exact type of 
every object on demand without paying the overhead of type maintenance. Furthermore, we 
will explore ways in which static type information can be used to automatically generate spe- 
cialized routines at compile-time for each data and control object within the program in order 
to perform such tasks. 


1.2 Id: A Strongly-Typed, Layered Programming Language 


The idea of using type information within the design of a high-level language, its compiler, or 
its run-time system is not new. But, very few systems make use of source type information 
right from the design of an application all the way down to its execution in a coherent manner. 
This research is geared towards such an integrated approach to managing type information in 
the context of the parallel programming language Id [Nik91], developed at the Computation 
Structures Group, Laboratory for Computer Science, MIT. 

Id is a high-level, strongly-typed language and it uses the Hindley/Milner polymorphic type 
system and its automatic type inference mechanism [Mil78, DM82] at its functional core. Id 
also offers imperative data-structures (I-structures [ANP89] and M-structures [BNA91]) that 
cater to imperative styles of programming. Id is a layered language by design (see Figure 1.1). 
The language and its implementation can be divided into three distinct layers: the user-level 
functional layer, the system-level imperative layer, and the architecture-level implementation 
layer. 

At the highest level of functionality, the Id language provides high-level constructs such as 
arrays, lists, tuples, higher-order functions, and user-defined algebraic types. Special syntactic 
constructs, such as array and list comprehensions and pattern matching are also provided. 
Applications manipulating these objects make use of system and user libraries that support or 
extend the functionality provided by the compiler. 

The system-level layer consists of the Id kernel language. The primitive [structure and 
M-structure datatypes provide the basic data-structuring and synchronized memory access 
mechanisms in this language. These primitive datatypes are used to represent all high-level 
data-structures. Loops and procedures constitute the basic control mechanism. The compiler 
translates high-level syntactic constructs such as pattern matching, and list and array compre- 
hensions into primitive operations on kernel datatypes. The system and user libraries may also 
make use of these kernel constructs to implement high-level data-structures. 

Finally, the architecture-level layer consists of the run-time system of the language and is 
responsible for implementing the Id execution model and managing the synchronized memory. 
The compiler also generates type information and run-time support code for garbage collection 
and source-level debugging that can be directly linked along with the object code to perform 
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Figure 1.1: The Layered Design of the Id Language. 


these auxiliary tasks during execution. 

This layered design presents a very flexible interface to the application writer where more 
functionality can be added to the user-level simply by adding more system-level libraries writ- 
ten in the kernel language. The type system is responsible for clearly defining and enforcing 
the abstraction between the two layers so that polymorphic, functional behavior and simple 
reasoning can be preserved at the user-level. At the same time, the Id run-time system is able 
to map the system-level kernel language constructs onto the underlying target architecture in 
an efficient way, independent of the source language used. 


1.3 Applications to Conventional, Imperative Languages 


The functional encapsulation mechanism described in this thesis is not only applicable to higher- 
order, polymorphic languages like Id and Haskell, but also to conventional, monomorphic lan- 
guages like C and Pascal. This mechanism allows safe conversion of mutable objects into 
read-only functional objects. This transformation is useful for both sequential and parallel 
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versions of conventional imperative languages. We discuss some of these uses below. 

The most important property of a functional object is that its value does not change during 
the course of execution of the program. Therefore, a functional object may be freely copied 
if necessary, or conversely, excessive copies may be freely eliminated. This property leads to 
obvious compile-time optimizations such as common sub-expression elimination, code-hoisting, 
and memory-fetch elimination that attempt to reduce the number of copies. This also permits 
unlimited caching of such functional data in a parallel machine without any risk of write- 
invalidation. In parallel systems using software-controlled shared-memory protocols [Nik94, 
FLR*t94] this may directly translates into cheaper protocols for object access and migration. 

While writing parallel programs, programmers often make implicit assumptions that a 
shared, mutable object may not be updated outside a given control block or that a partic- 
ular processor may have exclusive access to a shared object without actually locking it. Such 
assumptions are usually based upon the implicit logic of the program and as such it may be quite 
difficult to prove their correctness. With a little help from the user in identifying such objects, 
the encapsulation mechanism described in this thesis can verify such assumptions automati- 
cally. This mechanism also allows making safe, unsynchronized access to such shared objects 
outside their encapsulated control-block because the objects are guaranteed to be read-only at 
that point. 

Finally, conversion of mutable objects into functional objects also improves other compile- 
time analyses such as memory-aliasing analysis and loop-dependence analysis by clearly dis- 
ambiguating between read-only and read-write data. This, in turn, may benefit automatic 
parallelization of sequential programs that make use of such analyses. 

Thus, providing the ability to restrict the scope of side-effects to mutable data-structures 
translates into important optimizations at all levels of program design and implementation. 
This thesis provides the basic type-based framework for making such optimizations feasible. 


1.4 Outline of the Thesis 


1.4.1 Part I 


This thesis is divided into two parts. In Part I (Chapters 2, 3 and 4), we describe a powerful 
type system that has the ability to encapsulate programs constructing mutable data-structures 
and view them as returning functional data-structures while guaranteeing that no more updates 
take place on the returned objects outside the encapsulation. 

Chapter 2 is an informal and intuitive condensation of the major ideas in Part I. We 
introduce the problem by means of a simple example involving functional arrays in Id. We 
briefly survey the literature comparing various existing imperative type systems and informally 
describe our solution as an extension to one of the existing type systems. Then, we discuss 
“language-level issues” such as type-safety, polymorphism and non-mutability within our type 
system and how they interact with “system-level issues” such as space and time efficiency, 
parallelism, and memory synchronization. Finally, we describe specific strategies used in our 
type system that take care of these issues. 

Chapter 3 describes the formal machinery and the soundness proof of our type system that 
is the main theoretical contribution of Part I. We start with the description of a small, im- 
perative language containing simple mutable locations and a special language construct called 
close to convert them into immutable locations. We provide the dynamic and static semantics 
of this language in terms of relational axioms and inference rules and show their useful seman- 
tic properties. Then, we set up a semantic model that defines a consistent relation between 
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values and their types. This relation maps read-write locations to mutable types and read-only 
locations to functional types. Finally, we prove a soundness theorem stating that the static and 
the dynamic semantics of our expression language are consistent with respect to each other. 
It follows immediately that the mutable objects that are successfully converted into functional 
objects under our type system, are never updated again dynamically. 

Chapter 4 extends the formal machinery of Chapter 3 to complex datatypes such as arrays, 
tuples, functions, and general algebraic datatypes. We discuss how the user would syntactically 
specify the conversion of arbitrary, imperative data-structures into functional ones and how the 
compiler would automatically verify the soundness of this conversion. Finally we summarize 
the results of Part I and discuss directions for future research. 


1.4.2 Part II 


In Part II (Chapters 5, 6, 7 and 8), we study the technique of complete run-time type recon- 
struction and its various applications within the run-time system for Id. 

Chapter 5 discusses some design issues that affect the use of type information within a 
run-time system. There are “language issues” such as strong vs. weak typing, and allowing 
polymorphism and higher-order functions in the language; “compiler issues” such as using static 
vs. dynamic typing, and type inference vs. type declaration; and “run-time system issues” 
such as using tagged vs. untagged object representation model, and using type maintenance vs. 
type reconstruction to obtain type information at run-time. We classify various programming 
languages on the basis of these issues. We also discuss our approach of complete run-time 
type reconstruction with an untagged object representation model and discuss some of its 
applications such as source debugging, tagless garbage collection and I/O. 

Chapter 6 motivates the problem of compiler-directed polymorphic type reconstruction by 
means of examples and describes the technique informally. First, we describe the logical execu- 
tion model of an Id program, dividing the work into compile-time, link-time, invocation-time 
and run-time. Then, we characterize the type information that needs to be recorded at compile- 
time to permit complete type reconstruction at run-time. Next, we informally describe how to 
analyze and translate the source program to propagate this information. Finally, we show the 
process of run-time type reconstruction using an example and discuss some optimizations. 

Chapter 7 formalizes the concepts of Chapter 6 using a simplified kernel language for Id. 
This language is very close to the actual intermediate form used within the Id compiler. We 
present the analysis and program translation rules to generate and propagate all the necessary 
type information at compile-time. We also present a formal algorithm for type reconstruction 
and prove its correctness. 

Finally, Chapter 8 discusses a full scale application of type reconstruction, tagless garbage 
collection. We describe a study that compares the performance of our type-reconstruction based 
garbage collection scheme with conservative garbage collection and a compiler-directed explicit 
deallocation scheme. 


1.4.3. How to Read this Thesis 


Both Part I and Part IT are self-contained and may be read independently. 

For Part I, Chapter 2 should be sufficient for readers that are only interested in understand- 
ing the problem, its context, and the intuitive ideas behind the proposed solution. Readers 
interested in the mechanics of the proposed type system and its extensions, possibly with a 
view towards implementing it, should look at the semantic machinery described in Sections 3.1 
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and 3.2 of Chapter 3, the extensions discussed in Chapter 4, as well as the type inference ma- 
chinery described in Chapter 3 of [Ler92]. Of course, theoretical enthusiasts may want to go 
through all the detailed proofs provided in Chapter 3. 

For Part I, Chapter 5 and Chapter 6 provide a general introduction to the idea of using type 
information at run-time, an intuitive description of the issues involved, and the technique of 
complete type reconstruction and its various applications. Chapter 7 is a must for readers inter- 
ested in the detailed understanding and implementation of the type reconstruction mechanism, 
although the last section on the correctness proof of the reconstruction algorithm is mainly of 
theoretical interest. Finally, Chapter 8 provides a realistic perspective on the potential uses of 
this technique in the context of tagless garbage collection, and its cost trade-offs. 
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Part I 


Types in Language Design: 
Functional Encapsulation 


Chapter 2 


Functional Encapsulation of 
Imperative Data-Structures 


In this chapter, we study the problem of providing a suitable type abstraction mechanism 
between the user-level layer and the underlying system-level layer in a programming language. 
We introduce a new language construct called close that provides a statically verifiable, safe 
export mechanism for imperative data-structures from the system-level layer into the functional, 
user-level layer. We present several examples illustrating the usefulness of this construct and to 
discuss the technical issues involved in proving its soundness. We also compare our approach 
to other systems in the literature. 


2.1 The Problem 


Let us consider the problem of implementing functional arrays in Id that are homogeneous, 
non-mutable, polymorphic arrays. The library function make_vector creates a one dimensional 
functional array that memoizes a computation for a given index range as shown in the following 
example:! 


Example 2.1: 
def compute i =... some large computation ...; 
compute_memo = make_vector compute (1,10); 


How is the function make_vector implemented? Operationally, one has to allocate an empty 
vector and fill it with the result of applying the given function to each index position. There are 
two possibilities. We could treat make_vector as a language primitive and hard-wire it within 
the compiler. Then, we would have to provide a slew of such primitive functions that define 
functional vectors, matrices, and higher dimension arrays, along with their common patterns 
of construction. Some languages (including Id) provide special array construction syntax called 
array comprehensions to alleviate this problem. While array comprehensions are convenient, 
they still do not cover many useful construction patterns. They also increase the complexity of 
the compiler and the language it must manipulate. Moreover, this solution does not apply to 
user-defined functional abstractions in addition to those already present in the language. 


‘All our examples use the Id language syntax [Nik91]. We will provide brief explanations as necessary. 
Function definitions in Id are introduced with the keyword def, all statements are terminated with a semi-colon 
(;) and application is by juxtaposition. 
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The other possibility is to provide an imperative kernel language using which make_vector 
and other array construction functions may be defined in a separate library. Special syntactic 
constructs like array comprehensions may also be desugared into this kernel language. The ker- 
nel language would support primitive operations such as simple arithmetic, allocating a vector, 
storing/fetching a value at a particular index of a vector, and simple control mechanisms such 
as iteration and procedure call. This is the approach taken in Id. This approach also enables 
a system programmer to implement arbitrary new abstractions without changing the language 
definition or the compiler. As an example, the make_vector function may be implemented in 
the array library as shown below:? 


Example 2.2: 
def make_vector f (1,u) = 
{ a = i-vector (1,u); 
_= { for i <- 1 to u do 
ali] = fi}; 


in a }; 


Here, i_vector is a kernel primitive that allocates an empty one dimensional [-structure 
array, which is filled with the result of applying the filling function to each index position. 
Under the non-strict, parallel evaluation model of Id, the array a is returned as soon as it is 
allocated; its filling loop executes in parallel. However, this does not create any race condition 
for the array because the [-structure protocol [ANP89] supports fine-grain producer-consumer 
synchronization on every memory location: multiple readers wait at an empty location until a 
single writer fills it with the desired value. 

Nevertheless, as it stands, there are some technical problems with the above implementa- 
tion. I-structures and M-structures are imperative constructs, i.e., they can be assigned to,? 
whereas functional arrays are supposed to be non-imperative. Therefore, returning an assignable 
I-structure from make_vector is not appropriate. Furthermore, in the Hindley/Milner type sys- 
tem, imperative objects are allowed only a restricted form of polymorphism to ensure type-safety 
[Tof90]. Thus, the functional arrays implemented using [structures in this manner would have 
restricted polymorphism, which reduces the utility of such library implementations. 

Both problems described above may be solved by providing the ability to package the above 
implementation of the make_vector function into a type-safe, polymorphic, functional abstrac- 
tion as required by its intended interface. In general, the kernel language should contain a 
general type abstraction mechanism that can properly encapsulate such imperative implemen- 
tations of functional data-structures while ensuring their polymorphism and non-mutability 
outside the abstraction. 


2.1.1 Abstraction and Polymorphism 


It may appear that a conventional abstract datatype facility available in most modern languages 
should be sufficient for our purpose. Indeed, we could write a functional array datatype that 
is internally represented using I-structures and does not allow any mutation capability in its 
abstract interface. But, such an abstraction is still not completely satisfactory because it only 


All bindings within a let-block (enclosed within {}) execute in parallel. The bindings may be mutually 
recursive and their textual order is unimportant. An underscore (_) on the left-hand-side of a binding implies 
that the result of the right-hand-side expression is to be ignored. The result of the overall block is the value of 


the expression following in. 
°Strictly speaking, I-structures are not mutable, since they have write-once semantics, but an empty I- 
structure can be filled with any value using assignment. 
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hides the internal data representation of the functional datatype, it does not automatically 
restore the full polymorphism of the functional datatype from the restricted polymorphism of 
its imperative implementation. This polymorphic strengthening of the datatype has to be done 
explicitly and under additional semantic analysis that guarantees its soundness. As we will see 
in Section 2.2, the treatment of type polymorphism is significantly more complicated by the 
presence of imperative constructs under the usual call-by-value semantics. 

A radically different but equally interesting approach is to define a call-by-name semantics 
for the polymorphic objects within the kernel language and permit unrestricted polymorphism 
for imperative programs. In this case, the conventional data abstraction mechanism would be 
sufficient to hide the imperative implementation of a functional datatype. In this alternate 
semantics, called polymorphism-by-name [Ler93], the evaluation of a polymorphic object is 
suspended and each type instantiation re-evaluates the suspension in the current context to 
produce a fresh instance. In contrast, the usual ML-like polymorphism is called polymorphism- 
by-value where polymorphic objects are evaluated only once and the resulting value is shared 
among all its instances. 

Leroy showed in [Ler93] that the naive Hindley/Milner typing rules are sound with respect to 
polymorphism-by-name semantics for imperative references and continuations. This approach 
is used in languages like Quest [Car89] and to a limited extent in CLU [LABT81] where explicit 
type parameters are used to abstract and instantiate polymorphic objects. Unfortunately, 
suspension and re-evaluation of polymorphic objects destroys their sharing characteristics which 
are very important in the dynamic semantics of the Id language. Therefore, we would like to 
improve upon the abstraction characteristics of the polymorphism-by-value type systems that 
preserve such sharing. 

In another case, Wright experimented with the type system of Standard ML by restrict- 
ing polymorphism to only certain classes of syntactically recognizable values such as function 
declarations, constants, and known functional constructors [Wri93]. These functional values 
can be recognized statically and therefore can be generalized and shared safely. Mutable data- 
structures are always classified as dynamic entities and therefore can never be generalized. He 
showed empirical evidence that this restricted form of polymorphism is sufficient for a large 
class of existing Standard ML programs. Unfortunately, our ultimate goal is to provide func- 
tional and polymorphic view of dynamically created imperative data-structures for which this 
approach is entirely inadequate. 


2.1.2 Outline of our Approach 


We can divide our problem into two distinct phases. First, it is important to be able to give 
sound and accurate type semantics to imperative constructs in the kernel language. We must 
precisely capture the imperative types of mutable objects and propagate them with first class 
status while handling higher-order functions, storing into data-structures and passing them 
around as arguments and results. 

The second phase involves presenting a functional view of the mutable objects to the end 
user. This may involve a semantic check on the part of the compiler (or the system programmer) 
as well as some sort of type conversion to convert the imperative types into functional types. The 
type system must ensure that fully functional and polymorphic behavior is projected through 
the abstraction both in static and dynamic semantics. We present a new language construct 
called close that achieves this functionality through the type system. 

The interaction of polymorphism and imperative programming has been the subject of 
active research in the past decade [Dam85, Tof90, AM89, LW91, Ler92, TJ92, Wri92]. Several 
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type systems have been proposed in the literature spanning a wide range of expressiveness and 
complexity. We present a brief survey in Section 2.2. Since our main task is to provide an 
encapsulation mechanism for imperative program fragments, we prefer to extend an existing 
imperative type system that meets our needs rather than design a new one. We have chosen 
the Closure Typing system proposed by Xavier Leroy in this thesis [Ler92] as a convenient 
starting point for our encapsulation extensions. We motivate this choice in Section 2.2.7. 
In Section 2.3, we informally describe the meaning and the use of the close construct via 
examples and discuss issues of type-safety, non-mutability, efficiency, parallelism, and memory 
synchronization in their context. Finally in Section 2.4, we present informal typing strategies 
that ensure the soundness of the close construct. 


2.2 Imperative Type Systems 


It is well known that the simple Hindley/ Milner type system yields unsound typing when applied 
to mutable data-structures in the naive way. In this section, we briefly review this problem and 
describe some practical extensions to the type system that handle it to some extent. 


2.2.1 Simple Hindley/Milner Type Inference 


Consider the following example’ in Id that emulates the ref construct of ML using a naive, 
functional Hindley/Milner type system: 


Example 2.3: 
type ref tf) = mkref !to; 4 mkref :: Vto.to > (ref to) 


r = mkref identity; 4c Vty.(ref (t1 + t)) 


r!!mkref_1 = square; 
_~ = r!!mkref_1 true; 4 Dynamic Type Error! 


The datatype ref defines a polymorphic constructor mkref that allocates a mutable cell and 
initializes it with a given value. The value contained in the cell can subsequently be updated 
by field assignment as shown above. 

The type schemes” for the constructor mkref and the mutable cell r inferred by the naive 
type system are shown on the right. Note that the mutable cell r is given a polymorphic type 
which can be instantiated to int > int or bool > bool as desired. Thus, this example passes 
the type system even though it causes a run-time type-error, attempting to apply an integer 
function square to a boolean true. The problem is that the type of a mutable object should not 
be deemed polymorphic even if it initially contains a polymorphic value. This is because later 
such objects may be updated to contain values that do not possess the expected polymorphism. 
The type system must be aware of such mutable objects and keep track of their types in a 
sound manner. 

One way to avoid such unsound polymorphism is to statically approximate the state of the 
mutable store and the set of objects stored within it. These (presumably) mutable objects are 


‘User-defined types and constructors in Id are introduced with the keyword type. A (!) in front of a con- 
structor field denotes that it is mutable, and can be used in M-take/ M-put (!) or examine/replace (!!) operations 
during field access. Fields are accessed by position using numeric suffixes (starting from 1) to the constructor 
name. Although Id has parallel semantics, our examples assume a sequential order of evaluation for simplicity. 

°A type-scheme is a polymorphic Hindley/Milner type containing type variables, such as to,t1,..., some of 
which may be bound by the universal quantifier (V). Bound type variables may be substituted for different types 
in different contexts giving rise to polymorphic type-instances. 
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not allowed to have polymorphic types. The mutable store approximation needs to be updated 
whenever there is a possibility of allocating a new mutable object or updating an existing 
mutable object. This has to be achieved in a flexible but sound manner within and across 
function and local block boundaries. 

Many type systems in the literature follow this general framework [Dam85, Tof90, LG88, 
JG91, Wri92, TJ92]. The various systems differ in their notion of a store abstraction and the 
amount of information propagated across function boundaries. An illustrative comparison of 
some of these systems is presented in [OJ91]. First, we will briefly describe two such systems 
that are simple extensions of the original Hindley/ Milner type system and have been successfully 
used in practical programming languages. Then, we will describe two more recent type systems 
that are more complex but are much more powerful in dealing with higher-order functions. 


2.2.2 Type System of Standard ML 


In Standard ML [MT91, MTH90], type variables are syntactically classified into two separate 
categories: imperative type variables (uo, w,...) that may occur in the type of a mutable ob- 
ject at some stage of type inference and therefore implicitly model the abstract mutable store, 
and applicative type variables (to,t1,...) that can never occur in the type of a mutable object. 
Furthermore, since the evaluation of variables and A-expressions (termed as non-expansive ex- 
pressions) never generates any new mutable objects, imperative type variables occurring in 
their types are allowed to be generalized.° Whereas, applications and let-expressions (termed 
as expansive expressions) may allocate new mutable objects on evaluation, therefore imperative 
type variables occurring in their types are not allowed to be generalized. The resulting type 
system is sound [Tof90], easy to implement, and correctly rejects Example 2.3 as a type-error. 
To see this, note that under this scheme, storage allocating functions such as mkref always con- 
tain imperative type variables in their type-schemes because they allocate and return mutable 
memory locations. Thus, the type of r cannot be generalized since it will contain an imperative 
type variable that is created in an expansive expression (application). 

One of the problems with this system is that the modeling of imperative objects is too 
simplistic. The imperative type variables model values contained in a mutable location rather 
than the locations themselves. This has the effect of “contaminating” the types of the values 
fetched out of mutable locations. Consider the following example: 


Example 2.4: 
def identity x = x; 4 identity :: Vlo.to > to 
def identity’ x = (mkref x)!!mkref_1; 4 identity’ :: Vuo.u9 > uo 


nil’ = identity’ nil; 
x = t:nil’; 
y = true:nil’; i Static Type Error! 


Although identity’ is assigned a polymorphic type, it is still weaker than the type of the 
identity function. This is because the identity’ function temporarily stores its argument 
within an imperative location. This contaminates the type of the returned result to be impera- 
tive and unnecessarily restricts its polymorphism as shown. We would have liked to assign the 
same type to both identity functions. 

Another problem with this system is that the distinction of expansive and non-expansive 


° Type generalization refers to the process by which some type variables occurring in a type are bound with a 
universal quantifier (V) converting that type into a polymorphic type scheme. 
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expressions is also very simplistic. In particular, this system cannot deal with higher-order or 
partially applied imperative functions. Consider the following example: 


Example 2.5: 
mkref’ = identity mkref; % mkref’ :: uo + (ref uo) 
x = mkref’ 1; 
y = mkref’ true; 4 Static Type Error! 


The application of mkref to the identity function strips out its polymorphism because 
the type system deems this application as expansive whether or not any mutable reference was 
allocated within the identity function. This causes the unnecessary type-error to be flagged 
by the type system. The following example illustrates a similar problem for partial applications: 


Example 2.6: 
def imp_map f 1 = % impmap :: Vuguy.(uo + u1) > (list uo) + (list uy) 
{ arg = mkref 1; 4 arg :: (ref (list uo)) 
res = mkref nil; 4 res :: (ref (list u1)) 
in 


{while not (nil? arg!!mkref_1) do 
x:xs = arg!mkref_1; 
arg!mkref_1 = xs; 
res!mkref_i = f x : res!mkref_1; 
finally reverse res!mkref_1}}; 


def fn_map f nil = nil 4% fnimap :: Vioty.(to > t1) > (list to) > (list ty) 
|..fnmap f (x:xs) = f x : fn_map f xs; 


list_identity = impmap identity; 4 list_identity :: (list ug) > (list us) 
u = list_identity (1:2:nil); 
v = list_identity (true:false:nil); i Static Type Error! 


Just like identity’ function in Example 2.4, the type-scheme assigned to the function 
imp_map in the above example contains imperative type variables because it uses mutable lo- 
cations internally, while its functional version fn_map carries only applicative variables. Fur- 
thermore, when using imp-map, although no actual allocations take place until after its second 
argument, the type system has no way to determine this and it deems the first application to 
be expansive as well. This results in a non-polymorphic type for the list_identity function 
as shown. This problem of typing partial applications was fixed in part by the type system of 
Standard ML of New Jersey, which we discuss next. 


2.2.3. Type System of Standard ML of New Jersey 


The type system of Standard ML of New Jersey [AM89] assigns an integer rank to each im- 
perative type variable. We write these ranks as superscripts on the type variables. A rank 0 
imperative type variable u° occurring within the type of an expression at the top-level indicates 
that the type of some existing mutable object already contains u° and therefore u® should not 
be generalized. Such a type variable is said to have entered the mutable store typing. A pos- 
itive rank-d (d > 0) imperative type variable u? occurring within a function type denotes the 
number of application after which u% will enter the store typing. Therefore wu? is allowed to be 
generalized for up to d— 1 partial applications involving the function type where each partial 
application reduces its rank by one. This scheme is extended to typing objects enclosed within 
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A-abstractions by keeping track of the number of application necessary to make them enter the 
store typing. The resulting type system is slightly more complex than Standard ML but still 
relatively easy to implement and has been recently shown to be sound [HMV93]. 

Without going into details, it should be clear that this modification handles the function 
list_identity in Example 2.6 quite well. The type of the function imp_map is now inferred to 
be Vugut.(ug 4 ut) > (list ui) + (list uz), where the superscript 2 denotes that the actual 
allocation of imperative objects in the function’s body does not take place until after the second 
application. Therefore, the type of list_identity is inferred to be Vu3.(list ua) > (list ud), 
where the superscript 1 denotes the fact that one more application of this function will create 
some fresh mutable memory locations. 

Unfortunately, the simple ranking mechanism outlined above is still not sufficient to deal 
with imperative higher-order applications as shown in Example 2.5. The type system does 
not have any way to characterize when and how to incorporate “potentially” imperative type 
information from arguments of higher-order functions within their final result. Therefore, the 
type system must conservatively assume that all imperative functions generate mutable objects 
when passed as arguments to higher-order functions. The following comparison illustrates this 


point: 

Example 2.7: 
def apnil f = f nil; 4 apmnil :: Vtoty.((list to) > t1) 9 ty 
foo = ap_nil mkref; % foo :: (ref (list u?)) 
mkref’ = identity mkref; % mkret’ :: ud + (ref ud) 


Here, the imperative function mkref is passed as an argument to two polymorphic functions 
ap-nil and identity. In the first case (identifier foo), the type of the application is correctly 
inferred to be non-polymorphic because it actually creates a fresh mutable reference. But in 
the second case (identifier mkref’), the type of the application is unnecessarily non-polymorphic 
because the mkref function is never applied within the body of the identity function. The 
type of the identifier mkref’ should in this case be Vuj.u3 — (ref ug), which is identical to the 
type of the constructor mkref in this system. The problem is that the type system has no way 
of knowing that the function apnil applies its parameter f to one argument and therefore 
may potentially create mutable references, while identity passes its parameter unchanged and 
therefore cannot create any mutable references. Hence, the type system must conservatively 
assume that all imperative functions create mutable references when passed as arguments. 

The formalization for the above type system presented in [HMV93] is somewhat more pow- 
erful than the SML/NJ compiler implementation and it can deal with the above situation 
correctly. Although, it requires a more complicated mechanism for rank book-keeping and uses 
rank variables instead of fixed integral ranks. It also entails a more complicated type unification 
mechanism that needs to resolve algebraic constraints on rank variables. The interested reader 
is referred to [HMV93]. 


2.2.4 Limitations of the Standard ML Type Systems 


Although the two type systems presented above cover a lot of practically useful cases of im- 
perative programming, they are still not sufficiently powerful for our purposes. Ultimately, 
we intend to smoothly convert mutable types into functional types, so our type system must 
not only propagate the mutable type information properly where necessary, but also keep it 
self-contained and easy to manipulate. The problem of type variable contamination as shown 
in Examples 2.4 and 2.6 is a serious one in this regard. None of the systems presented above 
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have the ability to assign the same polymorphic type to functions identity and identity’ or 
functions fn_map and imp-map. At some observational level, these functions are equivalent but 
the internal implementation of the imperative versions shows up in their type and hence they 
are not interchangeable with respect to these type systems. 

Another fundamental problem with the above systems is that they concentrate on modeling 
“imperativeness” of objects only to the extent it affects their type polymorphism. For instance, 
there is no difference between the type of a record that contains a functional integer field and the 
one that contains a mutable integer field. Since we are ultimately interested in approximating 
dynamic mutability of all objects by means of their static types (whether polymorphic or not), 
the partial modeling offered by the type systems above is also unsatisfactory for our purposes. 

Both observations above show that tying the “imperativeness” of mutable objects to the 
kind of type variables contained in their types is rather simplistic and imprecise. We will now 
look at some type systems where this information is tracked independently, leading to a much 
more complete and cleaner characterization of imperative objects. 


2.2.5 Effect Systems 


Effect systems are a broad class of polymorphic typing systems that use static type-checking and 
inference techniques to model the dynamic behavior of programs written in imperative languages 
[Luc87, LG88, TJ92, Wri92]. Originally, such systems were used to collect and propagate 
side-effect information across program fragments for compiler optimizations and parallelization 
[LG88]. One such type and effect system was successfully used in the FX-87 language [GJLS87] 
which supported explicitly declared type polymorphism. 

More recently, automatic type and effect inference techniques have been developed [TJ92, 
Wri92] that use the effect propagation mechanism to infer types that model polymorphic imper- 
ative objects more accurately than the systems given above. As we will see shortly, such type 
and effect systems can be viewed as a logical extension of the type systems described above. 


Effect Analysis 


Probably the most appealing aspect of effect systems is their uniform and integrated mechanism 
of type and effect information propagation across all function and local block boundaries. The 
key idea is that every expression generates a read/write/allocate effect which is accumulated 
along with its type. The effect of the body of a function, parameterized by the effect of its formal 
arguments, is summarized as the latent effect of the function on the arrow type-constructor 
(—) in its type. Functions by themselves have no immediate effect. Unknown latent effects for 
functional parameters of higher-order functions are modeled using effect-variables. This effect 
parameterization permits a clean way of computing the overall effect of a function application 
by instantiating its latent effect by the effects of its actual arguments. The effect information 
propagated and accumulated in this manner may then be used to accurately identify the creation 
of polymorphic imperative objects and avoid their unsafe generalization. 

In one of the simpler effect systems proposed by Wright [Wri92], all type variables present 
in the type of a freshly allocated mutable data-structure are collected as part of the effect of 
that allocation. The explicit effect computation and propagation mechanism obviates the need 
to mark such type variables as imperative. Unsound typings are then avoided by disallowing 
generalization of type variables that occur in the immediate effect of an expression. This system 
still does not deal with the issues of imperativeness and type polymorphism independently, but 
at least the information flow across higher-order function boundaries is improved because of 
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the effect propagation techniques. 
As an example, consider the function fn_map shown below: 


Example 2.8: 
fnmap :: Vtotyfo.(to A, ty) ~, (list to) fo, (list t,) 
def fn_map f nil = nil 
|..fnmap f (x:xs) = f x : fn_map f xs; 


mkref i: Vig.to {allee({o)} (ref to) 
ref_list :: (list (ref (list t2))) with immediate effect {alloc(list tz)} 
ref_list = fn_map mkref (nil:nil); 


fo, fi,... are effect-variables which may be substituted for any effect. @ denotes the null 
effect. Effect-variables are allowed to be generalized and instantiated just like type variables. 
The type of the function fn_map illustrates the use of these effect variables. The latent effect 
of the mapped function is captured in the effect variable fp that is exposed in the final effect 
of the fn_map function. 

The example also shows the type of the reference allocator function mkref. The latent effect 
appearing over the arrow (—+) shows that the function allocates a mutable object of type to. 
As shown in the example, the effect of mapping mkref to a polymorphic list instantiates and 
exposes its latent effect of allocating mutable cells containing polymorphic objects. Since tz is 
present in the immediate effect of the expression creating ref_list, it cannot be generalized. 

This system infers the type-scheme Vtoty fo.(to fo, t1) ~, (list to) Hot oo (list t) for the 
function imp_map of Example 2.6 (c.f. fnmap of Example 2.8). Note that the first application 
has no effect, and the second application records the effect of allocating new local memory 
references for internal identifiers arg and res (as a set of type variables to, ¢ occurring in those 
reference types) as well as the effect of applying the argument function f (captured via the 
effect-variable fo). Thus, in this system, partial curried applications do not expose the final 
effects prematurely, but the problem of type contamination by unnecessarily exposing local 
effects still exists. 


Principal Types and Minimal Effects 


In order to compute the type and the effect of every expression automatically and efficiently, 
one must show that the system admits unique principal types and effects for expressions and 
that they are computable using an efficient inference algorithm. At least two effect-based sys- 
tems ['TJ92, Wri92] propose such inference mechanisms based on structural unification [Rob65]. 
The effect system of FX-91 [GJSO91] uses the more complex algebraic unification [JG91] which 
permits unification modulo algebraic identities such as associativity and commutativity. This 
provides more expressive power to the inference system, albeit at the cost of simplicity and effi- 
ciency. Here, we will only discuss the inference system based on standard structural unification. 

The basic idea is to compute the principal types of expressions in the usual way using the 
standard Hindley/Milner type inference mechanism while accumulating a set of constraints 
for the latent effect of all the function types in the program. Then, this constraint set is 
solved separately to obtain the minimal effect of each function in the program. This process 
is not completely straightforward because of the possibility of cyclic constraints created due to 
mutually recursive functions. The following examples illustrate this problem’: 


’The latent effects of functions are represented in this system by a constrained effect-variable. The constraints 
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Example 2.9: 


def f0 x = f1 x; % £0 :: to 25% with {fp fi} 

def ft x = f0 x; % £1: to Sst with {f, 2 fo} 

def g x = { a = mkref x; 4g: to Lo, (list (ref to)) with {fo J ({to}U fo)} 
in a:(g x) }; 

def h x = { a = mkref h; 4h: int 2s int with {fo {int 2, int }} 
in x+1 }; 


Minimal effects in the above cases are computed by combining the effects of all cyclic 
constraints into one and finding the least assignment to effect-variables (starting from the null 
effect @) that would satisfy all the constraint inequations. Thus, functions £0 and f1 in the above 
example are each assigned the null effect ¢ and the function g gets the effect {to}. The function 
h represents an interesting case. Depending on the desired semantic interpretation of effects, 
the least effect satisfying this constraint may be taken to be infinite and such expressions may 
be classified as ill-formed (system [TJ92]), or this constraint may be simplified to {fo J {fo}} 
which yields the null effect ¢ as the minimal solution (system [Wri92]). 


Region Analysis and Effect Masking 


Some effect systems also carry out a region analysis of memory allocation and sharing [LG88, 
TJ92]. The static description of an expression also summarizes a conservative approximation of 
the memory regions (locations) manipulated within the expression, in addition to its type and 
effects. Ifa set of regions is found to be purely local to an expression, 7.e., if these regions are not 
accessible through a free variable of the expression, and if they are not exported via the result 
of the expression, then the effects associated with those regions may be erased from the overall 
effect of that expression. The idea is that only certain “observable” effects on “visible” regions 
need to be kept, the rest may be safely erased without affecting the semantics of the program. 
This is known as effect masking. This analysis may be able to mask all the side-effects to internal 
data-structures of a procedure which largely alleviates the problem of type contamination. In 
this sense, this scheme is capable of automatically assigning purely functional types for some 
classes of imperative programs. For example, these stronger systems are able to infer the same 
type for impmap and fn_map (namely, Vtots fo.(to _b, t1) ~, (list to) _b, (list t;)) since the 
mutable references created within imp-map can be masked. 

Region analysis requires a lot of book-keeping to maintain a very fine static notion of the 
mutable store. The benefit of obtaining this additional region information and performing 
effect masking has to be weighed against the extra complexity required to do these analyses in 
a practical language implementation®. Furthermore, effect masking does not cover all cases of 
effect erasure that we are interested in. For example, the effect generated by the make_vector 
function of Example 2.2 can not be masked since the mutable vector is being returned as the 
result of the function. The user can still update this vector and destroy the type polymorphism 
that might result by erroneously masking this effect. 


are of the form (effect-variable J effect) which means that the effect on the right hand side is a lower bound on 
the actual effect denoted by the effect-variable on the left hand side. 

®Indeed, region analysis was dropped from FX-91 language [GJSO91] which is a more recent version of FX-87 
[GJLS87]. 
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Analysis of Effect Systems 


On the whole, effect systems seem to be a powerful tool to summarize a variety of dynamic 
behaviors of programs accurately. But we still have to extend the effect masking analysis to meet 
our original goal, which is to transparently encapsulate imperative programs into functional 
abstractions. We also anticipate that external factors such as a user-declared functional interface 
will play an important role in guaranteeing type-soundness in our system in spite of otherwise 
non-maskable imperative effects in the program. None of the existing effect systems incorporate 
such information. In Section 2.4, we will explore some of these ideas where a powerful type 
system is combined with user-supplied information while trading off some of its power for speed 
and simplicity. 


2.2.6 Syntactic Closure Typing System 


All the type systems we have seen so far, model the state of the dynamic mutable store and the 
operations performed on it using some static approximation, and then use that information to 
identify the objects that can safely be assigned polymorphic types. Instead of approximating the 
dynamic behavior of the program, Leroy and Weis [LW91] introduced a more direct, syntactic 
way of identifying and safely typing mutable objects using an extension of the Hindley/Milner 
type system. We discuss their technique below. 


Syntactic Analysis 


The key idea in the scheme proposed by Leroy and Weis is that the static type of a complex 
object can be used as a clue to its structural shape and dynamic properties (such as mutability) 
of its various components. For example, an i_vector type represents an assignable, I-structure 
array, while a vector type represents a functional array. This is exactly the information required 
to decide what parts of that object’s type can be safely generalized. Note that this information 
is independent of when/where/how the object was created in the program and depends only 
on its static type structure. This analysis relies on the assumption that the type of an object 
remains visible from all places within the program where that object may actually end up. Then 
the generalization scheme is simply that the type variables present within an assignable portion 
of a type (such as to within the type (i_vector to),(list t;)) are considered to be dangerous and 
are not generalized, while all other type variables occurring elsewhere in the type (such as t,) 
are allowed to be generalized. 

The key assumption in the above scheme is that all objects can be viewed as data-structures 
whose component types are reflected back in the type of the overall object. In particular, objects 
captured inside the environment part of a function closure must also be made visible in the type 
of that function. Otherwise the mutability information of a datatype could be lost by placing 
it within a function closure. The following example illustrates this point (c.f. Example 2.3): 


Example 2.10: 
0 ., ref to ref to . 
def fnref x = % fnref :: Vip.to > (void —> to, tp ~> void) 

{ vr = mkref x; 

def read () = r!!mkref_1; 

def write y = { r!!mkref_1 = y }; 

in 

read,write }; 


reader,writer = fnref identity; 
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_ = writer square; 
_ = reader() true; 4 Static Type Error! 


The function fnref emulates the functionality of the mutable constructor mkref of Exam- 
ple 2.3 by creating separate read and write handles to a shared mutable reference r. In the 
scheme proposed by Leroy and Weis, the function type-constructors (—) of read and write 
functions are augmented with closure types that expose the types of objects captured within 
their closure environments. Without closure types, it is impossible to tell from the types of 
the read and write functions that they share a mutable reference. Thus, closure types help in 
identifying hidden dangerous type variables and therefore avoid their unsound type generaliza- 
tions. In the above example, when the function fnref is used to create the reader and writer 
handles to a hidden mutable reference to identity, their non-empty closure types correctly 
restrict their types to be non-polymorphic and the type-error can be detected. 

In general, closure types for a function must keep track of the type of all the free variables 
occurring within the function body, whether such types are dangerous or not. This is because 
such free variables may correspond to formal parameters of an enclosing function that may ul- 
timately be instantiated with a mutable object at some application site. The following example 
illustrates this point: 


Example 2.11: 


def K x = {fun y = x}; 4 Koi: Vtoty. to > (th 5 ty) 
f = K (mkref identity); %o£ Vey. ty ref (bot) ref (tg > ta) 


The type of function f correctly generalizes t; and not t2 because tz occurs under a mutable 
type in its closure. This was possible only because we correctly kept track of the type to of the 
free variable x in the closure type of the body of the function K. 


Type Soundness and Type Inference Mechanism 


Leroy developed this idea in his thesis [Ler92] showing the soundness of a type system with 
closure types with respect to the dynamic operational semantics of a ML-like language with 
higher-order functions. He also showed a type inference algorithm based on this type system 
that is sound and infers principal types and closure types. 

The type inference for closure types turns out to be very similar to effect inference. A new 
class of variables called closure extension variables model the unknown closure types of higher- 
order functions just like effect-variables. There is some flexibility in deciding what closure type 
information really needs to be kept and what can be discarded. For example, it is possible to 
keep only dangerous and certain visible type variables within a closure type of a function instead 
of recording the full types of all its free variables. The algebra of Hindley/Milner types also 
has to be extended to incorporate extensible sets of closure types, including ways to compute 
dangerous type variables of a type and performing type substitution within closure types. The 
type inference mechanism then computes the usual Hindley/Milner types for all objects while 
accumulating a set of closure types for every function using simple structural unification. The 
interested readers may refer to [Ler92] for details. 


Analysis of the Closure Typing System 


Leroy’s syntactic system also succeeds in giving the same polymorphic type to imp_map and 
fn_map functions just like the effect-based systems with effect masking. In his thesis [Ler92], 
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Leroy makes some interesting comparisons of the expressive power of the various systems we 
have seen so far. His system turns out to be incomparable to the effect-based systems in 
power. This is not too surprising because his approach is semantically very different from the 
effect-based systems. 


2.2.7 Choosing an Imperative Type System 


As mentioned earlier, we have chosen the closure typing system of Leroy as a starting point 
for the typing extensions proposed in this thesis. In this section, we attempt to motivate this 
choice in the context of the various type systems we have seen above. 

The real choice is between an effect-based system (Section 2.2.5) or the closure typing system 
(Section 2.2.6). Type systems of Sections 2.2.2 and 2.2.3 and their extensions are either too 
simplistic in that they do not deal with higher-order functions properly or they suffer from the 
problem of type contamination. 

A requirement imposed by our ultimate goal to selectively convert some imperative objects 
into functional ones is that we should be able to uniquely label groups of imperative objects, 
recognize them independent of other objects, and track their movement within the program. 
Some sort of region-based analysis is necessary for this purpose. Either an effect-based system 
with regions may be used, or we may have to extend the closure typing system with regions. 

The contrast between the closure typing and the effect-based approaches may be understood 
by examining the way in which imperative type information is collected and propagated. In the 
closure typing system, the type of an object directly describes its imperative or functional com- 
position. This is purely static, locally determinable, object-based information. This property is 
extended even to functions where closure types are added to function types in order to describe 
the data captured within the closure environment. This is very appealing because at any given 
moment, all the relevant information about an object is available from its type wherever that 
object (and hence its type) travels. We say that this approach is data-driven since it keeps the 
relevant properties directly attached to the types of the data objects. 

On the other hand, in an effect-based system the object themselves are not classified as 
imperative or functional. We collect the operations performed on various kinds of objects in 
a separate effect. Such effects are carried over object manipulators (functions) as their latent 
effect. At any given moment, the properties of an object can be ascertained indirectly by 
examining the kind of effects it is participating in. Such a system is very good in summarizing 
dynamic properties of program fragments rather than describing the data itself. We say that 
this approach is control-driven since it keeps the relevant properties attached to types of control 
objects (functions). 

For our purpose, the data-driven approach is more direct and natural, since we are interested 
in determining and manipulating the imperative or functional nature of data objects directly. 
We need not separately keep track of the dynamic properties of the functions manipulating 
these objects. A sound, functional abstraction of an object can be built simply by changing 
the type of the object regardless of the way it is computed. Additional user information about 
an object should also be easy to incorporate into this system as long as we can show that such 
information preserves the type-safety of the static semantics. Due to these reasons we have 
chosen this type system as the basis of our extensions for converting imperative objects into 
functional ones, which we refer to as “closing” the imperative objects. 
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2.3. Closing Imperative Data-Structures 


In this section we will informally describe what we mean by “closing” an imperative object and 
discuss several technical issues arising out of it. 


2.3.1 A Proposal for “Close” 


We observed in Section 2.1 that the returned array from Example 2.2 is mutable and must be 
assigned a restricted form of polymorphism. This restriction is necessary to achieve the desired 
type-safety in the following example: 


Example 2.12: 
def fill n = identity; 


a = make_vector fill (1,u); % a 3: (vector (to — to)) 
ali] = square; % a 3: (i_vector (int + int)) 
_ = alj] true; i Static Type Error! 


The Hindley/Milner type of the returned array is shown on the right where the type variable 
to occurs free and is not generalized. The assignment “ali] = square;” refines the type of 
the array a as shown which correctly generates a type-error on encountering the subsequent 
application to true. This is necessary because the indices i and j may be turn out to be 
the same at run-time, in which case this application would lead to a run-time type-error. All 
imperative type systems in the literature [Dam85, Tof90, AM89, LW91, Ler92, TJ92, Wri92] 
catch this type-error at compile-time by restricting the polymorphism of imperative objects in 
one way or another. 


“Close” as a Type Converter 


Although the above behavior for make_vector is correct, ultimately, we want it to behave like 
a functional array constructor that returns a non-mutable, polymorphic array. The interesting 
observation is that if we convert the type of the returned array from make_vector to be the 
functional type constructor vector, then all mutation operations on it are automatically made 
illegal since it must now be viewed as a functional object. In this case, we would have flagged 
a semantic error at the assignment “aLli] = square;”. Since no more mutations are allowed 
on the array a, we may be able to safely generalize its type with respect to to. Henceforth, 
we will call this type conversion and subsequent type generalization operation as “closing” an 
imperative object. 
We can rewrite the make_vector implementation to reflect the above strategy: 


Example 2.13: 
make_vector :: Vtg.(int + to) — (int, int) > (vector to) 
def make_vector f (1,u) = 
close { a = i_vector (1,u); 
_= { for i <- 1 to u do 
ali] = f i }; 
in a }; 


The close construct in this implementation is intended to be a special form that captures 
our notion of closing an imperative object. It provides an alternate “functional view” for the 
imperative object. Users may use this construct in their programs to convert an imperative 
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data-structure (like the array a above) into a functional one. Or, such conversions may be 
issued automatically by a compiler while desugaring high-level functional constructs into low- 
level imperative program fragments. In either case, the type of the object being closed is 
converted from a mutable to a non-mutable type constructor that permits its subsequent type 
generalization. 


“Close” as an Encapsulator 


An important point to observe in Example 2.13 is that the close construct encapsulates the 
entire computation that allocates, fills, and returns the array a rather than acting merely as 
a marker for the array to be closed. Treating the close construct as an encapsulator clearly 
identifies the “scope” of the imperative operations being performed on the array a. Within this 
scope, imperative operations on the array are permitted, while outside this scope, the array 
is viewed functionally. This notation is useful both to the user, by providing a clear visual 
separation between the imperative and functional parts of the program, and to the compiler, 
that may need to compile these parts differently as well as verify the correctness of the close 
operation automatically. This implies that the following two expressions are not equivalent: 


close erp # { x = exp; in close x } 


Here, exp stands for an imperative program fragment that allocates and prepares an imperative 
object for closing. The close construct on the left-hand-side behaves like an encapsulator: it 
encapsulates the entire program fragment that builds the object imperatively and then returns 
it with a functional view. There is a clear separation between the imperative and the functional 
views of the object. While, the close construct on the right-hand-side identifies the object to 
be closed but it does not clearly identify the program region where the close operation should 
take effect. Thus, it becomes difficult for the type system to verify the correctness of the close 
operation. The importance of this distinction will become clear shortly. 

As a matter of notation, when only some of the objects being returned from an expression 
are to be closed, we specify it in a type annotation for the entire expression, where some of the 
components are only partially supplied: 


Example 2.14: 
close { a = i_vector (1,n); 
b = i_vector (1,n); 


in a, b } :: (vector _),-; 


The underscore (_) within the annotation implies that the close operation does not apply to 
that particular component of the result. All other components of the result are closed according 
to the type specified. Thus, in the above example, the array a is closed into a functional vector 
while the array b remains open. The contents of the array a also remain unaffected. The exact 
details of this specification appear in Chapter 4. 


2.3.2 Guaranteeing Type-Safety 


The problem of closing imperative objects is not simply a matter of type conversion as it might 
appear from the above discussion. Note that the closing operation is type-safe only if the object 
does not escape the scope of the imperative implementation in any other way except via some 
controlled, safe paths. We saw above that if the only access to a polymorphic imperative object 
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is through the returned result, then a type conversion allows us to do a type-safe generalization 
later on. But there are several other ways in which an object might escape a given scope, 
some of which are shown in the following example. Note that using the close construct as an 
encapsulator helps in identifying the escaping objects clearly: 


Example 2.15: 
def escape_ln = 
close { a = i_vector (1,n); 


bli] = a; % Storing into an external data structure 
in a }; 
def escape2 n = 
close { a = i_vector (1,n); 
in a, a } :: (vector _),_; % Returning unconverted object directly 
def escape_3 n = 


close { a = i_vector (1,n); 


def give 4 Returning a write handle within a closure 


{ ali] =v; in v }; 
ina, g } :: (wector _),-; 


In function escape_t, a reference to the locally allocated imperative array a is stored into an 
external array b. The type of the array b is constrained to be (i_vector (i_vector to)) implying 
that the array ais still accessible in its open form through this indirection. In function escape_2, 
two references to the same array are returned: one is closed and the other is left open according 
to the specified annotation pattern. Mutations via the open reference will affect the type-safety 
of the closed version. The same effect is achieved in the function escape_3, although it is 
disguised in the form of a function that provides a write handle to the array being closed. 

The essential problem in the above examples is that it is safe to close a polymorphic mutable 
data-structure only if it is guaranteed that no write handle pointing to that object remains 
accessible to the user after it has been closed. Otherwise, the subsequent functional behavior 
implied by the close operation and its possible type generalization will both be unsound. 

All the imperative type systems in the literature automatically take care of such cases by 
avoiding generalization of imperative objects at all times. The trouble arises when we wish 
to force the type system to accept a functional, polymorphic type for an imperative object as 
implied by the close construct in the above examples. Then, either the user must be held 
responsible for the type-safety of the resulting program, or it becomes the responsibility of the 
type system (the compiler) to automatically verify the soundness of this transformation and 
reject the unsafe cases. 


2.3.3. Guaranteeing Non-Mutability 


Note that type-safety is an issue only for polymorphic imperative objects, 7.e., imperative objects 
that have some potentially generalizable type-variables in their type. This is because the usual 
typing rules would ensure that all values assigned to monomorphic mutable objects would have 
compatible types. For example, all the functions in Example 2.15 would be type-safe if assumed 
monomorphic even if the array being returned was subsequently mutated. 
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However, our intended meaning of the close operation is more than simply ensuring safe 
type generalizations. We want to enforce non-mutability of the returned data-structure which 
is a much stronger property of dynamic semantics compared to the weaker property of merely 
avoiding run-time type-errors (type-safety). For polymorphic objects, non-mutability implies 
type-safety and vice versa, but that is not the case for monomorphic objects. As the preceding 
discussion shows, ensuring non-mutability involves a simple form of escape analysis on the 
part of the compiler which is conventionally performed using dataflow analysis or abstract 
interpretation [GP90, GPG91, HI89]. Indeed, all the imperative type systems in the literature 
concentrate on the issue of type-safety alone. 

In our case, we intend to model such simple form of escape analysis for free using the 
existing machinery of our type system that is already required to ensure its type-safety. Our 
machinery ensures true functional semantics for successfully closed objects, i.e., such objects are 
guaranteed to be side-effect free and can participate in compiler optimizations such as common 
sub-expression elimination and code-hoisting that depend upon the objects being functional. 
Thus, the close construct serves as a true interface between the low-level, imperative layer and 
the high-level functional layer of the language. 


2.3.4 Efficiency and Parallelism 


Consider the following example adapted from [BNA91] that builds a n-bucket functional his- 
togram of objects stored in a binary search tree. The search tree datatype is also shown below 
for convenience: 


Example 2.16: 
type tree t = leaf | node ¢ (tree t) (tree 1); 


def histogram t n = 
close { a = m_vector (1,n); 
_= { for i <- 1 to n do 
a![i] = 0 }; 
- = accum t an; 


in a }; 


def accum leaf an = () 
| accum (node x lr) ane= 
{ i = hash x n; 
a'l(fi] = a!li] + 1; 
~ = accum lan; 
_ = accum ran; }; 


The histogram function allocates an empty mutable vector with n buckets and initializes 
each of the buckets to zero. The accum function uses pattern-matching to traverse the tree 
structure recursively and increments the count in the appropriate bucket.? 

A couple of important observations can be made about the above example. First, all ac- 
cumulations are made to the same mutable array which is closed and returned at the end. No 
copying is involved during accumulations or at the time of returning the final array. Most 


°The notation “a![i]” in Id denotes M-take/M-put operations on mutable arrays with read-and-lock/write- 
“---” denotes a local barrier. All the computation above the barrier must 
terminate before any of the computation below the barrier is allowed to proceed. See [BNA91, Bar92] for details. 


and-unlock semantics. The notation 
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strongly-typed systems would only allow creating an internal mutable array to which accumu- 
lations are made, then copy the final tallies to a functional array which is returned. Hence, 
overall functional behavior is achieved at the cost of copying the final data-structure which 
may be quite expensive. The close construct automatically achieves the functionality without 
sacrificing the efficiency in such cases. 

Second, all computations in Id are performed in parallel by default, constrained only by 
data-dependencies. In the above example, the histogram initialization and the entire tree ac- 
cumulation can potentially be done in parallel. The close construct places no restrictions on 
the kind of parallel activities that can occur within the encapsulated expression — it simply 
closes and returns the final result. In a purely functional setting, some compilers would per- 
form extensive destructive update analysis, linearity analysis, use linear type systems, abstract 
datatypes or monadic language constructs [Blo89, Wad90, Hud92, PJW93, LPJ94] to deter- 
mine that the histogram may be safely single-threaded through the computation and hence 
modified in place. Not only does this require a lot of compiler analysis, but single-threading 
the computation completely destroys the parallelism inherent in the problem. 


2.3.5 Termination of Side-Effects before “Close” 


Example 2.16 illustrates another important point. Given the parallel execution model of Id, 
we must wait until all the accumulations have completed before closing and returning the his- 
togram array in order to guarantee that the returned array is not updated anymore. This is 
ensured by inserting a local barrier (---) before returning the histogram which waits for all the 
computations before the barrier (issued in the current scope) to terminate before proceeding 
to the computations after the barrier. The barrier may be considered as an independent syn- 
chronization operation necessary for closing mutable objects in the presence of parallel updates 
(as shown here), or it could be taken as part of the close operation itself. In the latter case, 
the close construct would behave like a strict encapsulator that releases the closed object 
only when the encapsulated computation has completely terminated, rather than as a mere 
type-converter. 

The readers may have noticed that we did not use barriers in Examples 2.13 and 2.15. 
This is because of the different underlying memory access protocols being used for the objects 
in those examples. Examples 2.13 and 2.15 use I-structure arrays, while Example 2.16 uses 
an M-structure array. A barrier may be necessary when the memory access protocol used for 
implementing an imperative object is not the same as that of the corresponding closed object. 
We discuss the various memory access protocols below. 


Memory Access Protocols 


Id defines three classes of data-structures at the language level: Functional, structure, and 
M-structure. Functional data-structures are read-only, [structures are write-once, and M- 
structures allow multiple updates. At the architecture level, these data-structures map into the 
following three kinds of memory access protocols: 


Unsynchronized Memory Access — This is the ordinary load/store memory access used 
in conventional architectures. Each memory transaction is assumed to be exclusive and 
non-blocking. There is no synchronization of any kind between readers and writers. 


I-Structure Synchronization — The [structure protocol [ANP89] enforces producer-consumer 
synchronization between a single writer and multiple readers using full/empty presence 
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bits on memory locations. A location is deemed empty initially. Multiple readers may 
issue I-fetches all of which block until the single writer performs an [-store changing the 
state of the location to full. The stored data is then distributed to all the blocked and 
subsequent readers. Multiple writes to the same location are considered to be an error. 


M-Structure Synchronization — The M-structure protocol [BNA91, Bar92] enforces mutual- 
exclusion synchronization among multiple readers and writers. Readers issue M-take op- 
erations on full memory locations that read the location and leave it empty. A subsequent 
M-put on the location restores the status to full and makes the data available to other 
readers. It is possible to allow only one outstanding M-put operation and several M-takes 
waiting to succeed as done in Id, or one could queue up both M-takes and M-puts and 
match them up. 


I-structure and M-structure objects are implemented using their respective memory access 
protocols, but functional objects may be implemented using either unsynchronized or I-structure 
access protocol. However, intuitively it should be clear that a given object cannot be accessed 
using two different protocols simultaneously — that would lead to a run-time error. Therefore, 
it becomes necessary to ensure that all in-flight imperative operations on an object have ter- 
minated before it is closed and accessed as a functional object. A barrier may be inserted just 
before the close operation in order to guarantee this. 

Note that we only have to wait for the termination of all memory operations issued from 
within the scope of the close construct because we already ensure that no imperative handle to 
the object being closed can escape this scope. Of course, no barrier is needed if the underlying 
memory access protocol remains the same when changing from an imperative to a functional 
view of the same object. For instance, currently the Id compiler uses the I-structure protocol 
to implement all functional objects. Therefore, no barrier is needed when closing I-structure 
objects into functional objects (Examples 2.13 and 2.15), whereas a barrier is required when 
closing M-structure objects into functional objects that use the I-structure protocol (Exam- 
ple 2.16). 


Protocol Conversions 


Figure 2.1 depicts all possible protocol conversions at the time of closing an object. An imper- 
ative object may be implemented using any one of the three memory access protocols, while 
a functional object may use either the unsynchronized read protocol or the I-structure read 
protocol. The arrows depict the protocol conversion implied by the close operation. The 
annotations on the arrows summarize the kind of barrier required, if any, for the underlying 
protocol conversion. We discuss the various cases below. 

When closing an unsynchronized mutable object into an unsynchronized functional object 
(refer to Figure 2.1), we need to make sure that all previously issued write operations have 
terminated. Otherwise, the closed object may get updated after being closed. This is enforced 
by using a write-barrier before closing the mutable object. 

Although, the Id compiler uses the -structure protocol to implement all functional objects, 
it is possible to implement functional objects that are known to be strict without any synchro- 
nization. It is also possible to introduce unsynchronized objects as another primitive data class 
within the language that need not pay the significant overhead of I-structure synchronization, 
especially when it is emulated in software. In this situation, a write-barrier is necessary when 
closing an I-structure object into an unsynchronized object. Otherwise, subsequent unsynchro- 
nized read operations would not see the effect of any outstanding [-store operations. However, 
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Mutable Object Synchronization Protocols | Functional Object 
Unsynchronized I-Structure M-Structure Sync. Protocols 


Write-Barrier 
Write-Barrier 


; Strict, Unsynchronized 
Full Barrier 


No Barrier 
Single Outstanding Put - Take-Barrier 


I-Structure 
Multiple Outstanding Puts - Full Barrier 


Figure 2.1: Conversions among Synchronization Protocols at the time of Closing. 


any outstanding [fetch operations can always be satisfied using the data that is already present, 
therefore we need not wait for any outstanding fetch operations to terminate before closing 
the object. 

When closing M-structure objects into unsynchronized functional objects, it is clear that 
we must wait for both M-take and M-put operations to terminate before accessing the object 
with unsynchronized read operations. This is because both M-take and M-put may modify the 
actual contents of the memory location and all such modifications must complete before it is 
safe to use the object functionally. 

We already mentioned that the I-structure protocol is currently used within the Id compiler 
to implement both functional and I-structure objects. The only difference between the two 
at the language-level is that functional objects are allocated and completely defined at the 
same time and then subsequently used in a read-only fashion, while structure objects may be 
allocated and then independently filled via assignment anywhere within the program. Since the 
underlying synchronization protocol is the same in both cases, no barriers are necessary when 
closing an [-structure object into a functional object. 

Finally, while closing M-structure objects into functional objects that are implemented using 
the I-structure protocol, if only one outstanding put is allowed, then it is possible to use only a 
take-barrier [Bar92] instead of the usual full barrier. This is because once a location is empty 
after a successful M-take operation, multiple functional /-fetches may be allowed to queue up 
and the ensuing M-put can be made to satisfy them just like an Lstore would. 


Discussion 


Since there are so many possibilities due to variations in data classes, synchronization proto- 
cols and their implementations, henceforth, we shall assume that the close construct is always 
explicitly or implicitly accompanied with the appropriate barrier where necessary. The main 
thrust of our research is to guarantee type-safety and dynamic non-mutability via static anal- 
ysis which is orthogonal to the issue of guaranteeing dynamic termination of parallel update 
operations upon closing an object. Therefore, in the rest of this thesis, we will only concen- 
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trate upon strict, sequential, unsynchronized accesses to memory as done in most conventional 
languages. 


2.4 Sound Typings for Imperative/Closed Objects 


As discussed in the last section, our overall strategy for closing imperative objects can be 
summarized as follows: 


1. First, we have to model the “imperativeness” of objects within the type system. 


2. Next, we develop sound verification criteria for the type system under which an object 
can be safely closed. 


3. Finally, we apply the criteria to each object being closed at compile-time, verify the safety 
of closing and convert the type of the object appropriately if the verification succeeds. 
Otherwise, we raise a static “close-error”. 


Following the above outline, in this section we informally discuss the typing machinery 
required for describing and closing imperative objects and present a set of closing strategies 
under which this operation can be done safely. These strategies form the basis of the formal 
static and dynamic semantics presented in the next chapter. We also touch upon some language 
design issue that will be discussed in greater detail in Chapter 4. 


2.4.1 Modeling “Imperativeness” in Types 


In Section 2.2.7, we motivated our choice of closure typing system of Leroy as a starting point 
for the typing extensions being proposed in this thesis. We also mentioned that we will need 
some sort of region-based analysis in order to distinguish among various kinds of imperative 
and functional objects. In this section, we informally describe this type representation. 

Our approach takes a middle ground between the effect and the closure typing system of 
Leroy. We model the “imperativeness” of an object using parameterized type constructors where 
a simple region expression is attached to each type constructor that identifies whether or not 
that constructor is imperative. A region expression p is either a region variable r, or the null 
region «. The intuitive idea is that a type constructor with a null region is considered to be 
functional, while the presence of a region variable identifies it to be imperative (c.f. closure 
typing system) as well as provides an abstraction for a set of locations associated with that 
object (c.f. effects system with regions). Another way to look at this is that a non-null region 
expression associated with a type constructor ensures a read/write capability over the objects 
of that type, while a null region provides a read-only capability over the objects of that type. 

As an example, the type of the application Gmkref identity) (Example 2.3) is shown 
below under various type systems:!° 


Standard ML notation [MTH90] uses postfix type constructors in type expressions, as in (u — u) ref. We will 
follow that notation in Chapter 3 when discussing formal semantics. For now, we use prefix type constructors 
since they are more intuitive. 
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Tipe Sytem Tape of ae eee) 
Standard ML ([MTH90, Tof90]) ref (uu) 
Standard ML/NJ ([AM89, HMV93]) ref (u° > u®) 


Simple Effects ([Wri92]) ref (t tL, t) with effect {alloc(t, f)} 


Effects with regions ((TJ92]) ref (t tL, t) with effect { alloc, (t tL, t)} 
Closure Type ([Ler92]) ref (t >t) 
Closure Type with regions (this work) ref (r) (t ++ t) 


In our representation (the last row), the type constructor ref is accompanied by a new 
unique region variable at every application of mkref within the program. These region variables 
participate in type unification thereby abstractly keeping track of the set of statically aliased 
reference locations and their scope of accessibility rather than relying on various classes of type 
variables or a separate set of effects. 

The advantage of this representation is that it allows us to close the type of an imperative 
object by simply replacing the appropriate region variables in a type constructor by the null 
region € under suitable conditions. The “imperativeness” of an object can still be determined 
syntactically by examining its parameterized type constructor, so we are still following the 
closure typing system; no separate effects need to be collected. 

Furthermore, a direct correspondence can be established between a user-defined imperative 
type constructor that is parameterized by one or more non-null regions and a completely func- 
tional version of the same type constructor by simply erasing all its qualifying region expressions 
without disturbing the type constructor itself. For instance, now we can define just a single 
parameterized array datatype vector(p) where a region p = r represents an I-structure array 
and a region p = ¢€ represents a functional array.!' The functional type constructor vector is 
now considered to be a type synonym for vector (e). 

Having independent region variables also separates the issue of type polymorphism from non- 
mutability quite well. The imperativeness of an object is reflected in the regions associated with 
its type constructor and not in its polymorphic type variables. Indeed, imperative properties 
of a monomorphic type such as point given below can also be accurately represented and 
manipulated: 


Example 2.17: 
type point = pt !float !float; 


The type constructor point will be parameterized by two region variables point(r1,1r2) rep- 
resenting the fact that it has two mutable fields that can be closed independently. The exact 
association of region variables to mutable fields can be specified explicitly within the type dec- 
laration or defined implicitly. We will come back to these language design issues in Chapter 4. 
Now, let us look at some sound verification strategies for closing imperative objects. 


2.4.2 Handling the Environment 


Once an imperative object is created and is made accessible as part of the environment at a 
particular scope, it is nearly impossible to close it safely at that scope or at any scope lexically 
inside it because many other objects may already hold a write handle to it. That is why we 


"However, M-structure arrays would still require a separate type constructor in order to distinguish them 
from I|-structure arrays. We will come back to this issue in Chapter 4. 
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specified the close construct as an encapsulator of the entire program fragment that constructs 
the imperative object (Section 2.3.1) rather than as a mere type-converter. This situation is 
further complicated by the fact that the scope of accessibility of a mutable object is not always 
the same as the scope of its allocation because a locally created object may be made accessible 
non-locally by storing it into a global data-structure. The function escape_1 of Example 2.15 
illustrates this problem. A write handle to the locally allocated object a is made accessible by 
storing it into the global object b. Now anybody looking at b could get hold of a and assign 
into it. Therefore, it is not safe to close or generalize a when it is returned from the function 
escape_l. 

Fortunately, modeling the imperativeness of an object using region variables allows us to 
detect this situation statically. The region variable associated with the type of an imperative 
object becomes visible in the enclosing type environment when that object is exported into the 
enclosing value environment. This is illustrated below: 


Example 2.18: 
b = i_vector (1,1); 4 b :: (vector(r1) (vector(r2) t)) 
def escape_ln = 
close { a = i_vector (1,n); 


b[1] = a; 4 a :: (vector(r2) t) 
in a }; 4 Unsafe close detected. 


The assignment (b[1] = a) causes the region variable rz contained within the type of the 
array a to become visible in the type environment enclosing the close construct through the 
type of array b. This fact may be used as a static test while typing the close construct to 
detect such escaping objects. This is summarized in the following typing strategy: 


Closing Strategy 1 An object may be safely closed at the lowest lexical scope higher than the 
scope of its creation at which none of the region variables contained in its type occur free in the 
type environment. 


Sometimes, the type of a mutable object can escape into the type environment without 
actually leaving a write handle around. This may happen if the type of the mutable object 
is shared with some other global object due to type-unification. This phenomenon is called 
region-aliasing and is illustrated in the following example: 


Example 2.19: 


a = i_vector (1,1); 4 a :: (vector(r,) t) 
b = close { c = i_vector (1,2); 4c 3: (vector(r2) t) 
d= if ... then a else c; 
in c }; % Cannot close due to region-aliasing. 


In the above example, the typing of the conditional expression unifies the region variables 
r, and ro of the arrays a and c respectively. Now, according to Strategy 1, the array c cannot 
be closed because its region variable is visible in the enclosing type environment even though 
the array itself does not escape into the enclosing scope in any way. Such cases are unavoidable 
in a conservative, static type inference system. 


2.4.3. Handling Structured Results 


Until now we were considering cases where a single, flat, local data-structure is closed and 
returned. The function escape_2 in Example 2.15 illustrates the case when the mutable object 
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to be closed is returned as part of another object. In general, multiple objects could be closed 
and returned from a scope and all of them would have to be verified for safety simultaneously 
because they may refer to each other. 

In function escape_2 of Example 2.15, the returned object is a 2-tuple both of whose 
components point to the same shared array a. Since, the second component of the return tuple 
provides a write handle to the same array, closing its first component should be illegal. We 
reproduce the example below: 


Example 2.20: 
def escape2 n = 
close { a = i_vector (1,n); 4 a :: (vector(r) t) 


in a, a } :: (vector _),-; 


In terms of types, we observe that the region variable r in the type of array a would get 
erased in the type of the first component (due to close) but would still be present in the type of 
the second component. This fact can be used to detect such escaping write handles as expressed 
in the following typing strategy: 


Closing Strategy 2 Local data returned from a scope is allowed to be closed only if none of 
the region variables being closed occur free in the remaining type of the returned data. 


The above strategy stresses two important points. First, we must specify exactly which 
occurrences of region variables we are interested in closing. In some sense, this requires us to 
specify exactly which fields or locations in a mutable object are we interested in closing. 

Second, there should not be any way to access the open version of the object being closed 
via the contents of the object itself. Since the structure of an object is reflected in its type, this 
check can be performed statically by testing whether any of the region variables being closed are 
visible in the type of the rest of the object. Note that this does not preclude the possibility of 
closing recursive or cross-referenced mutable objects. The only restriction is that all references 
to the same mutable object must be closed simultaneously, otherwise the close operation will 
not be safe. 

Note that the function escape_2 would be acceptable if both the write handles being re- 
turned were closed at the same time. The following example would also be acceptable since the 
region variables associated with the types of a and b are unrelated: 


Example 2.21: 
def escape_2’ n = 
close { a = i_vector (1,n); 
b i_vector (1i,n); 
in a, b } :: (vector _),-; 


Here a may be closed successfully and converted into a functional data-structure while b 
remains mutable and is typed in the usual way. 


2.4.4 Handling Functions 


The simple Strategy 2 works well with explicitly nested, first-order data-structures like tuples 
and arrays. Function closures present a different problem as illustrated by the definition of 
escape_3 in Example 2.15. Here, a write handle to the array a escapes within the definition of 


48 


the function g. The ordinary Hindley/Milner type of the function can not capture this fact at 
all since it only records the types of arguments and the result of the function. 

This is where Leroy’s closure typing information carried on the function type proves useful. 
Using the closure type, we can easily determine if the region variable being closed is present 
within the returned closure. If so, then the close operation fails. With this addition, the 
Strategy 2 will be able to detect the escape of the write handle to array a from escape_3 within 
the closure type of the function g. 

Note that in the closure typing system, there is no way to distinguish between a function 
reading from a mutable object and another that writes to it. Therefore, all such functions are 
conservatively considered to be potential writers and the region variables contained within their 
closure types should never be closed. This is expressed in the following strategy: 


Closing Strategy 3 Region variables occurring within the closure type of a function are never 
allowed to be closed. 


In a more expressive effect-based system [TJ92], one might be able to separate functions 
that only read from a mutable object from those that both read and write the object. In that 
case, only the latter class is a candidate for potential type-safety violation, the functions that 
only read from a mutable object may be allowed to close those objects. 


2.5 Summary 


To summarize, we have informally shown above how to extend a state of the art imperative 
type system [Ler92] with a type abstraction mechanism that can be used to convert imperative 
objects into functional objects in a type-safe and transparent manner without the loss of storage 
efficiency or parallelism. Specifically, we have proposed a new type-domain construct called 
close that controls this type abstraction as a program encapsulator. We have informally 
shown several typical uses of such a facility, discussed its implications on efficiency, parallelism 
and dynamic memory access protocols, and outlined possible strategies to verify its correctness 
within the type system. Finally, we have also given a flavor of the kind of syntactic and semantic 
machinery that may be required to express, propagate and analyze such information. The next 
chapter formalizes these ideas in the context of a polymorphic, strict, sequential language and 
shows a soundness theorem guaranteeing that closed objects verified by our type system cannot 
be updated during evaluation. 

Our guiding principle behind this approach has been to engineer a practically useful notion 
of encapsulating imperative programs and data-structures into functional abstractions. Our 
ideas are geared more towards simplicity and run-time performance of user programs (space 
efficiency and preserving parallelism) rather than towards sheer expressive power of the type 
system. 
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Chapter 3 


Semantics of “Close” 


In this chapter, we describe the semantics of the close operation. This semantics is presented 
in the framework of a small kernel language that supports recursive functions, tuples, and 
simple reference locations. In Chapter 4, we will extend this system to handle more general 
data-structures such as arrays and algebraic types. Our type system is a direct extension of 
the Closure Typing system presented in Chapter 3 of Xavier Leroy’s Ph.D. thesis [Ler92]. 

We present the static and the dynamic semantics of our kernel language and show a corre- 
spondence between the two in the form of a soundness theorem (Theorem 3.16). This is our 
main result. It gives us the guarantee that well-typed terms do not run into run-time type- 
errors. The theorem also implies that mutable objects can be safely considered to be functional 
once they are successfully closed, i.e., in a type-correct program it is impossible to update an 
object that has been closed by the type system (Corollaries 3.17 and 3.18). Finally, we use the 
same type inference algorithm as described in [Ler92] that infers the correct and most general 
type of every expression in the program. 

As far as possible, we have kept the same mathematical notation as used in [Ler92]. 
Throughout this thesis, all symbols appearing in typewriter font are taken verbatim. They 
denote syntactic entities that stand for themselves. Symbols appearing in SMALL CAPITALS 
denote classes of objects. Unless specified otherwise, Greek symbols and symbols appearing 
in italics stand for meta-variables that can be replaced with specific object instances in their 
class. 


3.1 Kernel Expression Language 


3.1.1 Expression Syntax 


The EXPRESSION language is defined below: 


EXPRESSIONS: aus ¢€ constant 
x identifier 
op(a) primitive application 
f where f(x) =a recursive function 
ay a2 application 
let « = a, in ay let-binding 
(a1,...,@n) n-tuple 
close a close expression 
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In this grammar, x and f range over an infinite set of IDENTIFIERS. c ranges over a prede- 
fined set of CONSTANTS including unit (()), boolean (true, false) and integer (...,-1,0,1,...) 
constants. 

In the expression op(a), op ranges over a predefined set of OPERATORS including the usual 
arithmetic and comparison operators, ith element projection operators for n-tuples, and a 
ternary conditional operator. This set also includes the primitive operators to allocate (ref), 
dereference (!) and assign (:=) mutable reference locations that will be described later in more 
detail. In general, arguments of multi-arity operators are supplied as tuples,’ but we will 
freely use special syntax for some common operators, for example (if...then...else...) for the 
conditional operator, (z:=v) for reference assignment, and simple pattern matching for tuple 
projection. 

The expression f where f(x) = a denotes user-defined recursive functions. The identifier 
f can occur inside the expression a. This makes our small language more realistic and allows 
us to provide meaningful examples. The let construct is the source of polymorphism in this 
language. In some of our Id examples, we represent several let-bindings together in a block 
enclosed within braces ({}). Finally, we have added the close construct that enforces functional 
behavior on the data-structure being returned from the expression a. 

The set of FREE IDENTIFIERS of an expression a is denoted by F(a) and is computed in the 
usual manner as shown below: 


Fle) = 9 F(f where f(x) =a) = Fla)\ tf} 

F(x) = {x} F(let «=a, ina2) = F(a) U (F(az) \ {z}) 
F(op(a)) = F(a) F(d1,+++54n) = Ursin Fai) 
F(a, a2) = F(a) U F(a2) F(closea) = F( 


3.1.2. Dynamic Semantics 


The dynamic semantics of the above language is defined using relational semantics. We define a 
predicate relation between syntactic expressions and results that tells whether a given expression 
can evaluate to a given result. This relation, called EVALUATION JUDGMENT, is of the following 
form: 

eba/s>r 


Here € is an ENVIRONMENT, § is an initial STORE, and r is the RESULT of evaluating the expres- 
sion a under the environment e and the initial store s. Evaluation judgments are established 
using a system of axioms and inference rules. This technique is also known as “Structured 
Operational Semantics” (SOS) [Plo81]. 

Semantic Objects 


First, we define the semantic objects used in the dynamic semantics: 


'Primitive operators are not allowed to be curried. 
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RESULTS: ronu= o/s value and result store 


err error 
VALUES: vo ons ¢ constant 
(n-tup U1,..., Un) n-tuple 
(clsr f, x, a, €) function closure 
i store location 
STORABLE VALUES: won=  vrw read/write value 
v,ro read-only value 
ENVIRONMENTS: e ns {tp v1,...,%n 4 vn} 
STORES: sons {yp wy,...,l,5 wn} 


An evaluation can either result in a type-error or it produces a well defined value along with 
the final store. A well defined value is either a constant base value, a tuple of values, a function 
closure, or a store location. 

Environments bind free identifiers of an expression to values. Stores map locations to 
storable values that consist of a value and a tag that denotes whether that location has 
read/write or read-only semantics. This flag is used in defining the semantics of the close 
construct. We assume selector functions value(w) and tag(w) that select the value and tag 
respectively from a storable value. 

Both stores and environments are finite mappings that support the following operations: 


Notation 3.1 


1. For any mapping F’, we denote the DOMAIN of F' by Dom(F’) and its RANGE by CoDom(F’). 
2. The extension of a mapping F at the domain point p with a range value q is written as 
F + {pw q} and is defined in the usual way: 


_jq ife=p 
(P+ tp ai)(2) = F(2) otherwise 
3. The restriction of a mapping F' to the domain A, where A C Dom(F’), is denoted by F | 4. 
4. A finite mapping F = {py 6 @,.--,;Pn Gn} is considered to be undefined outside its 
domain {p,---pn} unless specified otherwise. 


Given a value v, we inductively define £(v) to be the set of all locations directly contained 
within it: 


L(c) = ¢ 
L£((n-tup U1, +++, 0r)) = Ureien £(vi) 
L((clsr f,x,a,e)) = Le) 
LI) = ty 
For an environment € = {21 +4 0,..+,%n + Un}, we define L(e) = Uy cicn (vi). 


We define the set of locations reachable from a given object with respect to a given store as 
follows: 


Definition 3.2 (Reachability) Given a value v and a store s, we define Reachable(v, s) to 


be the set of all locations within the domain of s that are either directly contained within v or 
transitively contained in a value stored at such a location via the store s. This extends naturally 
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(pointwise) to values present in an environment e. 


ob 


Reachable(c, s 


) 
Reachable((n-tup v1,..-,Un),8) = Usejen Reachable(v;, s) 
Reachable((clsr f,x,a,e),s) = Reachable(e, s) 
Reachable(l,s) = @ | ¢ Dom(s) 
Reachable(l,s) = {l}U Reachable(v’, s) value(s(1)) = v’ 
) 


Reachable(e,s) = Us<icn Reachable(v;, s) e= {1 U,..-,%n > Un} 


Although the above definition is correct, it does not lead to a well founded induction on 
the structure of values because we may have circularly defined data-structures. However, at 
any given step of evaluation, the size of a value and the number of locations reachable from it 
are both finite, so we can easily compute the reachable locations using the following recursive 
algorithm that is guaranteed to terminate: 


GATHER-LocaATIONS(v, s, L) 


1 case v of 
2 c : return L 
3 (n-tup v1,...,Un): for i+ 1 to ndo 
4 EL <& LU GatHer-LocaTIons(ty, 5, L) 
5 return L 
6 (clsr f,v,a,e): let {zy v1,...,%, 1 vp} =e 
7 for i+ 1 to ndo 
8 Eo LU Gatuer-LocatTions(uy, 5, L) 
9 return L 
10 l : ifle LVI ¢ Dom(s) then return L 
11 else let v’ = value(s(1)) 
12 return GATHER-LOCATIONS(v’, s, {J} U L) 


The above algorithm traverses the given value v in a depth-first recursive fashion and ac- 
cumulates the set of all its reachable locations in the variable L. If the current value is a valid 
location of the given store, then its contents are recursively traversed at Line 12 only if it is not 
already in the set L. Thus, no object accessible from the given value is traversed more than 
once and the algorithm is guaranteed to terminate. 

The reachability function given in Definition 3.2 can now be computed as follows: 


Reachable(v, s) = GATHER-LOCATIONS(¥, s, ¢) 


Evaluation Rules 


Figure 3.1 shows the axioms and inference rules for establishing evaluation judgments e F 
a/s =r. An axiom P allow us to conclude that the proposition P holds. An inference rule is 
of the form: 


P ee P, 
P 
All the antecedents P,;,...,P, must hold in order for us to conclude the consequent P. 


The inference rules given in Figure 3.1 provide a strict, sequential, call-by-value seman- 
tics for our kernel language. This can be seen from the fact that the store is sequentialized 
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CONST: eb c/s=> c/s 


IDENT: z € Dom(e) 
: eFa/s => e(x)/s 
ABS: Y = F(f where f(x) =a) 
: et (f where f(x) = a)/s => (clsr f, x, a,€ |y)/s 
et ay/s => (clsr f, 2, do, €0)/S1 
App: ek ag/s, > v2/s2 
: cot {fh (clsr f, 2%, do, €0), tH va} F do/s2 > v/83 
eb (a1 d2)/s > v/s3 
TUPLE: eb ay/s > v1 /s1 tae €F ay/Sn—1 > Un/Sn 
eF (a1,...,@,)/s => (n-tup v1, .--, Un) /Sn 
eb ay/s => v1/s1 et {ar vy} ag/s, > v2/s2 
LET: oa II OWWWMWOOO 
eF (let =a, in dg)/s > v2/82 
eFa/s=>v/s1 | ¢g Dom(s1) 
ALLO: et ref(a)/s => I/(s1 +{1 4 v,rw}) 
DEREF: eF a/s=> l/s, Le Dom(s;) value (s1(1)) = v 
eF la/s=> v/s1 
assiay: eb a/s = (1,v)/s1 Le Dom(s1) tag(si(/)) = rw 
; eF :3(a)/s => C/(s1+ {lL v, rw}) 
eFa/s=> l/s, s1(1) = v, rw 
CLOSE: L = Reachable (1, s1) U Reachable(e, 51) UUpe Doms) Reachable(l’, 51) 


et (close a)/s > I/(s1 |x +{/ 4 v, ro}) 


Figure 3.1: The Dynamic Semantics of the Kernel Expression Language. 


through various computations (APP, TUPLE, and LET rules) and that function and let bodies 
are evaluated in an environment where arguments are bound to values (APP and LET rule). 
Figure 3.1 only shows the inference rules that lead to the computation of a well defined value. 
Our convention for the generation or propagation of the err result is as follows. Some rules 
have antecedents that require pattern matching: the operator in the APP rule must evaluate to a 
closure value, the expression in the CLOSE rule must evaluate to a location with a read/write tag, 
the expression in the DEREF rule must evaluate to a location, and the location to be assigned in 
the ASSIGN rule must have a read/write tag. We add an err generating inference rule for every 
case of mismatch between any of these patterns and the actual values and tags found during 
their evaluation. Similarly, err propagating inference rules are added for each antecedent in 
an inference rule that may generate an err result. In all these cases, the consequent simply 
evaluates to the err result and all propositions following the error generating antecedent are 
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ignored. 

Most of the axioms and inference rules shown in Figure 3.1 for the various kernel language 
constructs are fairly standard and self explanatory. We have shown the primitive operator rules 
for reference operators only. Usual arithmetic and structural operators (tuple projection) are 
defined in the usual way. The ALLOc rule initializes new reference locations with a value and 
a read/write tag. We assume that an infinite set of new locations is available. The DEREF rule 
reads the value out of an existing location regardless of its tag. The ASSIGN rule only assigns 
to locations which have a read/write tag. 

The CLOSE rule requires a little more explanation. This rule is the only place where the 
read/write tag of a location is explicitly changed to a read-only tag. This makes that reference 
object non-mutable. We have also restricted the domain of the final store to the reachable 
locations of the location being closed, the current environment and the locations of the initial 
store. This operation removes some non-reachable garbage locations from the final store that 
may contain references to the location being closed. Although this operation seems somewhat 
artificial, it is of immense help in reducing the complexity of the soundness proof later on. We 
motivate the reasons for doing so below. 

A more intuitive semantic rule for the close construct would be: 


eFa/s=> l/s, s1(l) =v, rw 


' TL lpinen le Tile. LSlauxprnl\ 
CLOSE’: ef (close a)/s = I/(s; + {1 v, ro}) 


This rule does not restrict the domain of the resulting store. Why would we want to do 
that operation anyway? The following example brings out the issue: 


Example 3.1: 
a = close { 


= ref 1; 
c = ref b; 
in b }; 


Within the scope of the close block, a freshly allocated reference c points to another fresh 
reference b. Both these references are present in the store that is returned from the block 
although there is no way to access the reference c once that block is exited. The unreachable 
reference to b via c creates technical problems while showing the correspondence between the 
static and dynamic semantics? therefore we would like to get rid of it. One direct way of 
achieving this is to restrict the domain of the final store to contain just the reachable locations, 
as we have done in the CLOSE rule above. 

The alternate cLosr’ rule is not wrong. We just have to do more work while showing its 
soundness restricting our attention to just the reachable locations of the current value and 
the current environment with respect to the current store at every step of the proof due to 
the presence of garbage locations such as c scattered in its domain. In technical terms, this 
would imply that all our proofs must be carried out using the method of co-induction (due to 
the possibility of having cyclic data-structures within the store) rather than a straightforward 
induction on the structure of the current value and a separate induction involving all the 
locations in the domain of the current store. Therefore, we have opted for the somewhat non- 
intuitive CLOSE rule in order to avoid the complex semantic machinery required to show the 
soundness of the alternate CLOSE’ rule. 


?Since we have not yet shown the static rule for close or the semantic machinery used to show soundness, 
we request the reader to bear with us for the time being. 
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3.1.3 Properties of the Evaluation Rules 


In order to convert read/write store locations into read-only locations in a safe manner, we 
need to characterize the allocation, reachability, and manipulation of store locations during an 
evaluation. In this section, we show two important properties: locations reachable through the 
result of an evaluation are either new locations or reachable through the evaluation environment 
(Proposition 3.5), and old locations that get updated during an evaluation are always reachable 
through the evaluation environment (Proposition 3.6). Both these propositions will be used 
later in proving the soundness of the close construct. But, first we show some auxiliary 
propositions. 

It is evident from the evaluation rules presented in Figure 3.1 that the domain of the store 
keeps growing during an evaluation. We do not model storage reclamation in these rules. This 
allows us to state the following: 


Proposition 3.3 let a be an expression, v be a value, e be an environment, and 89, 8, be initial 
and final stores respectively such that e+ a/so > v/s,. Then Dom(so) C Dom(s;). 


Proof: by induction on the length of evaluation derivation for a. A simple examination of the 
evaluation rules shows that in all cases except the CLOSE rule, either the domain of the store 
grows or it remains unchanged. In the case of the CLOSE rule, the domain of the final store 
is possibly smaller than that of the intermediate store due to the domain restriction, but it 
still includes the entire domain of the initial store by construction. 


Next, we show that a given property applicable to all locations of a store extends inductively 
to all values and environments that refer to the locations in that store. 


Proposition 3.4 Let e be an environment and so, 51 be stores such that Dom(so) C Dom(s,), 
and for all locations | € Dom(so) 


l’ & [Reachable(I, 51) \ Reachable(1, s9)] => l' ¢ Dom(s0) Vu € Reachable(e, so) 
Then, for any value v' and environment e’ we have, 


l’ € [Reachable(v', s1) \ Reachable(v’, so)| => IU ¢ Dom(so) VU € Reachable(e, so) 
l' € [| Reachable(e’, 5,) \ Reachable(e’, so)| => U ¢ Dom(so) VU € Reachable(e, so) 


As a corollary, for e = e we have, 
l' € Reachable(e, 51) => l' ¢ Dom(s0) Vu € Reachable(e, so) 


Proof: by structural induction on v’ and the values contained in the environment e’. We show 
the various cases for values. 
Case 1: v’ is c — Trivial, since there are no reachable locations from a constant. 


Case 2: v’ is (n-tup U1,...,Un) — By definition of reachability for tuples we have, 
Reachable((n-tup v1,..., Un), 5) = UJ Reachable(v;, s) (3.1) 
l<i<n 


The result follows from above using the induction hypothesis for each individual v; and the 
following algebraic identity for arbitrary sets: 


U x}\| U vdeo U &\Y) (3.2) 


l<i<n l<i<n l<i<n 
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Case 3: v’ is (clsr f, x, a0, €9) — Same as above. 
Case 4: v’ is 1 — If | ¢ Dom(s;) or | ¢ Dom(so) then we have nothing to prove. Otherwise 
the result follows from the given relation regarding locations. 


The environment hypothesis follows from the value hypothesis using the definition of reach- 
ability for environments and Equation 3.2. 


Now we prove the proposition that partitions the locations reachable from the result of a 
evaluation into those that are freshly allocated and those that are reachable from the evaluation 
environment. 


Proposition 3.5 (Fresh Locations) Let a be an expression, v be a value, e be an environ- 
ment, and 89, 8, be initial and final stores respectively such that et a/so => v/s 1. Then, 


l’ € Reachable(v, 51) => ’ € Dom(so) Vu € Reachable(e, so) 
and for all locations | € Dom(so), 
I' € [Reachable (I, 1) \ Reachable(1, s9)] => l' ¢ Dom(so) \/ I’ € Reachable(e, so) 


Proof: by induction on the length of evaluation derivation for a. We consider the various cases 
for the last evaluation rule in the derivation. 
Case 1: const — Trivial, since Reachable(c, 51) = ¢ and so = 51. 
Case 2: IDENT — Trivial, since Reachable(e(z), 51) C Reachable(e, s,) and so = 51. 
Case 3: ABS — Trivial, since Reachable((clsr f,x,a,e |y),51) = Reachable(e |y,s1) C 
Reachable(e, s;) and so = 51. 


Case 4: app — The evaluation rule is: 


et a1/s => (clsr f, %, a0, €0)/S1 
ek ag/s > v2/s2 
eo t+ {f > (clsr f, 2, ao, €0), & + v2} F ag/s2 > v/s3 
(yg) [8 0/83 


Let ey =e9 + {f + (clsr f, v7, a0, eo), & HH v2}. First, we show the value hypothesis for this 
case, 7.e., we show: 


' € Reachable(v, 83) => I! ¢€ Dom(s) \/U' € Reachable(e, s) (3.3) 
Applying the induction hypothesis for values to the last premise we obtain: 
' € Reachable(v, s3) => I! ¢ Dom(s2) \/I' € Reachable(ey, s2) (3.4) 


Note that l’ ¢ Dom(s2) implies I’ ¢ Dom(s) because Dom(s) C Dom(sz) from Proposi- 
tion 3.3. If l’ € Reachable(e1, sz), then using the definition of reachability and e; we have 
the following two cases: 


e l' € Reachable(eo, s2) — In this case, we use the induction hypothesis for locations on 
the second premise in Proposition 3.4 with environment e’ = eg to obtain: 


l’ € [Reachable(eo, 82) \ Reachable(eg, s1)] => l' ¢ Dom(s1) Vu € Reachable(e, 51) 
(3.5) 


58 


To eliminate Reachable(eo, s1), we use the induction hypothesis for values on the first 
premise to obtain: 


l' € Reachable(eo, 81) => U' ¢ Dom(s) \/ U' € Reachable(e, s) (3.6) 


Also, we simplify l! € Reachable(e,s,) on the right hand side of Equation 3.5 by 
applying the corollary in Proposition 3.4 for the first premise: 


' € Reachable(e, 81) => l' ¢ Dom(s) \/l' € Reachable(e, s) (3.7) 
Combining Equations 3.5, 3.6, and 3.7 we obtain the following as desired: 
l' € Reachable(eo, 82) => l' ¢ Dom(s) \/ U' € Reachable(e, s) (3.8) 


e l' € Reachable(vz, 82) — In this case, we use the induction hypothesis for values on 
the second premise and then simplify as above using Proposition 3.4 to obtain, 


l’ € Reachable(v2,s2) => I ¢ Dom(s1) Vu € Reachable(e, s1) 
= I'¢ Dom(s)\/l' € Reachable(e, s) (3.9) 


Combining statements 3.8 and 3.9 proves the statement 3.3 as desired. 


Now we show the location hypothesis, 7.e., for all locations 1 € Dom(s) we show that: 
I' € [Reachable(1, 83) \ Reachable(1, s)] => I’ ¢ Dom(s)\/I' € Reachable(e,s) (3.10) 
We use the following algebraic identity that is true for arbitrary sets: 
X\Y C(X\Z)U(Z\Y) (3.11) 
Using this identity, we obtain: 


[Reachable (1, s3) \ Reachable(l, s)| 
Cc [Reachable(l, s3) \ Reachable (1, s2)| U [Reachable(l, sz) \ Reachable(1, s)] 
Cc [Reachable(l, s3) \ Reachable (1, s2)] U [Reachable(l, sz) \ Reachable(1, 51)|U 
[Reachable(l, s1) \ Reachable(1, s)] (3.12) 


Now we use the induction hypothesis for locations for each of the three clauses on the right 
and simplify using Propositions 3.4 and 3.3 to obtain the desired result. 


Case 5: TUPLE — The location hypothesis is shown exactly like the case above. We give the 
argument for the value hypothesis. The evaluation rule is: 


eF ay/so > v1 /s1 tae €F ay/Sn—1 > Un/Sn 
eb (a1,.--,4n)/So => (n-tup v1,..-, Un) /Sn 


We have to show that: 
' € Reachable((n-tup v1, ..., Un), 8n) => U’ ¢ Dom(sp) \/l' € Reachable(e,so) (3.13) 


Applying the induction hypothesis for values to each premise (1 <7 < n) and simplifying 
using Propositions 3.3 and 3.4 we obtain: 


l' € Reachable(vj, s;) => U' ¢ Dom(so) \/ ' € Reachable(e, 80) (3.14) 
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In order to show Equation 3.13, we need to strengthen Equation 3.14 to l' € Reachable(v;, 8, ) 
(1<i<n). We use the algebraic identity 3.11 repeatedly to obtain the following: 


[Reachable(v;, s;,)\ Reachable(v;, s;)] C UJ [Reachable(v;, s;)\ Reachable(v;, s;-1)] (3.15) 
n>j>t 


We use the induction hypothesis for locations and Proposition 3.4 to simplify each of the 
clauses on the right in the above statement and plug in Equation 3.14 to obtain the desired 
result of Equation 3.13. 

Case 6: LET — Same argument as in the APP case. 

Case 7: ALLOC — The result follows from the induction hypothesis and the fact that the 
allocated location is in fact chosen to be a new location that is not present in Dom(s,) and 
hence not present in Dom(s). 

Case 8: DEREF — The result follows directly from the induction hypothesis and the definition 
of reachability for locations. 


Case 9: ASSIGN — The evaluation rule is: 


eb a/s = (1,v)/s1 lL € Dom(s1) tag(si(/)) = rw 
eF :8(a)/s=> O/(s1 + {LF v, rw}) 


The value hypothesis follows immediately since no locations are reachable from (). For 
the location hypothesis, note that the final store sy; = s; + {1 4 v,rw} differs from the 
intermediate store s; only at location /. Furthermore, using the induction hypothesis for 
values we know that: 


' € Reachable(v, 51) => I! ¢ Dom(s) \/U' € Reachable(e, s) (3.16) 


Thus, the location hypothesis will be valid for the location | as well which is assigned the 
new value v. 

Case 10: CLOSE — By construction, the final store contains all the reachable locations from 
the location being closed, the current environment, and the old store. Thus, both value 
and location hypotheses follow directly from the induction hypothesis since changing the 
tag of a location does not affect its reachability. 


Finally, we show the proposition that characterizes the set of locations that may get updated 
during an evaluation. 


Proposition 3.6 (Updated Locations) Let a be an expression, v be a value, e be an envi- 
ronment, and so, 81 be initial and final stores respectively such that et a/so > v/s,. Then for 
any location | € Dom(so), 


value(so(l)) # value(s; (1)) => | € Reachable(e, so) 


That is, pre-existing locations that get updated during an evaluation are reachable from the 
environment. 


Proof: by induction on the length of evaluation derivation for a. We consider the various cases 


for the last evaluation rule in the derivation. 
Case 1: CONST, IDENT, and ABS — Trivial, since so = s,. 
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Case 2: AapP — The evaluation rule is: 


et a, /s => (clsr f, 2, a0, €0)/S1 
eb ag/s1 > v2/s2 
eo t+ {fr (clsr f, x, a0, €0), & 4 v2} F ag/s2 > v/s3 
So  €F (ay ag) 80/83 


Three possibilities arise for value(s(1)) # value(s3(l)): 


1. value(s(1)) value(s, (J)) — The result follows immediately by applying the induction 
hypothesis to the first premise. 

2. value(s(l)) = value(s; (1)) but value(s,(1)) A value(so(1)) — Using the induction hy- 
pothesis on the second premise we obtain that | € Reachable(e,s,). Using Proposi- 
tion 3.5 together with Proposition 3.4 for environments we obtain that 


| € Reachable(e, 51) => | ¢ Dom(s) \/1 € Reachable(e, s) (3.17) 


Since we know that | € Dom/(s), the result follows. 

3. value(s(l)) = value(sy(1)) = value(sg(l)) but value(so(l)) A value(s3(l)) — Using in- 
duction hypothesis on the third premise we obtain that | € Reachable(e,, 52) where 
ey =eot+ {f © (clsr f, x, a0, €0), & + v2}. This can be simplified to the desired result 
just as in the proof of Proposition 3.5. 


Case 3: TUPLE and LET — Same argument as above. 
Case 4: ALLOC and DEREF — The result follows directly from induction hypothesis. 
Case 5: ASSIGN — The evaluation rule is: 


eb a/s = (I,v)/s1 lL € Dom(s1) tag(si(/)) = rw 
eF :8(a)/s=> O/(s1 + {LF v, rw}) 


For all locations other than /, the result follows from the induction hypothesis. In case of 
location /, we apply Proposition 3.5 to the first premise and obtain that, 


l' € Reachable((1,v), 81) => U' ¢ Dom(s) \/l! € Reachable(e, s) (3.18) 


It is clear that / is reachable from the pair (/,v). The result follows from the above 
statement and the induction hypothesis that | € Dom/(s). 

Case 6: CLOSE — All locations reachable from the initial store s are included in the final 
store by construction. Furthermore, the values present at these locations are the same as 
those in the store sj. Thus, the result follows from induction hypothesis. 


3.2 A Closure Typing System 


Now we will describe our extension to Xavier Leroy’s closure typing system [Ler92]. 


3.2.1 Type Syntax 
The type grammar is defined below: 
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TYPE VARIABLES: a,b n= ¢t regular type variable 
u closure extension variable 
r region variable 
TYPES: Tons ¢t regular type variable 
L base type 
T1 (1) T2 function type 
T1y e203 Tn n-tuple type 
T ref (r) mutable reference type 
T ref (€) non-mutable reference type 
CLOSURE TYPES: Rous wu closure extension variable 
Oo, closure type 
REGIONS: pou= r region variable 
€ null region 
TYPE SCHEMES: o ss Vay,...Qn.T7 


In this grammar, a function type (—) is decorated with a CLOSURE TYPE which is a set 
of type schemes together with a closure extension variable u. The closure type of a function 
corresponds to the type schemes of the free identifiers of the function that are stored in its 
closure environment. The order of occurrence of the type schemes in a closure type does not 
matter. Note that the above grammar does not allow more than one closure extension variable 
in a closure type. 

A reference type is parameterized by a REGION expression which could be a region variable 
r or the null region constant €. Regions serve to model the mutability of store locations, while 
types serve to model the structure of dynamic values. That is why the domain of regions is 
much simpler than the domain of types. 

A region variable parameter r on a mutable reference type serves two purposes. It identifies 
the reference type as being mutable and it also serves as an abstract static label for the corre- 
sponding dynamic mutable location (and any other locations aliased to it) that has that type. 
This abstraction is useful in tracking the dynamic mutable locations reachable from a given ob- 
ject by statically observing the region variables present within its type. We will formalize this 
correspondence between regions variables and mutable locations in Section 3.3. Non-mutable 
or “closed” references are identified by a fixed null region constant (€) because there is no need 
to keep track of locations that have been closed. Note that ref(r) and ref(e) are considered to 
be distinct type constructors; they have a similar form only for syntactic uniformity. 

For any type object 7, where 7’ may be a type, a closure type, a region, or a type scheme, 


its FRBE VARIABLES F(T) are defined inductively as follows:? 
F(t) = {ty Flu) = {uy 
Flu) = 6 F(o,m) = F(a)UF(r) 
F(t 4m) = F(m)UF (mr) U F(t) Fir) = {r} 
F(n, or) Tn) = Ut<i<n F (ri) Fe) = 
Fir ref(p)) = F(r)UF(p) F(Vay...Qn.T) = Fir) \{az...an} 


Ina type scheme o = Va, ...a,. T, the variables {a,...a,} are called the BOUND VARIABLES 
denoted by B(o). For any type object 7’, we also define the DANGEROUS VARIABLES P(T) and 
the DANGEROUS REGION VARIABLES R(ZT’) inductively as follows: 


“Note that we are using the same notation here as that. for computing the free identifiers of an expression 


because it represents the same concept. The meaning is always clear by context since we never mix types and 
expressions. 
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Dit) = ¢ Diu) = 6 
Diu) = 6 D(o,7) = D(o)UD(z) 
D(11 4n)> 12) = D(r) Dir) = @ 
D(11, a +) Tn) = Ur<i<n D(7) D(e) = ¢@ 
D(t ref(r)) = F(t ref(r)) D(Va,...Q,.T) = D(r)\{ar...an} 
Dit ref(e)) = Dir) 
R(t) = @ R(u) = @ 
Rly) = @ R(o,r) = R(o)UR(r) 
Rim m4) = R(x) R(r) = @ 
R(t, sey Tn) = Ul<icn R(7;) Rie) = ? 
R(t ref(p)) = R(t) UF(p) R(Vay...Q,.T) = R(T) \{ar...an} 


Specifically, for a mutable reference, the region associated with that type and all type vari- 
ables contained within it are considered to be dangerous. The variables occurring inside a 
non-mutable reference type are not considered to be dangerous. For a function closure, the 
typing rules shown later ensure that the types of all objects reachable from the closure environ- 
ment are recorded in its closure type. Therefore, the types of mutable references accessible via 
the closure environment are also visible in its closure type and are considered to be dangerous. 

Using the type abstractions shown above, we can accurately capture and control the static 
(type polymorphism) and the dynamic (mutability) properties of imperative data-structures. 
The basic idea of our type system is to use the type of a composite object as a clue to the 
reachable mutable reference locations contained within it. Dangerous variables provide this 
clue directly from the overall type of an object. Intuitively, dangerous type variables model the 
polymorphic values stored within mutable objects and the dangerous region variables model 
the mutable locations contained within such objects. 


3.2.2 Static Semantics 


The static semantics of our kernel language is defined in the same manner as its dynamic 
semantics. We define a predicate relation between syntactic expressions and types that tells that 
a given expression elaborates to a given type. This relation, called ELABORATION JUDGMENT, 
is of the following form: 

Era:r 


Here EF’ is a TYPE ENVIRONMENT which is defined below as a finite mapping from identifiers to 
type schemes. 


TYPE ENVIRONMENTS: Bone {pH o1,...,%n on} 


TYPE SUBSTITUTIONS over this type algebra are finite mappings from regular type variables 
to types, from closure extension variables to closure types, and from region variables to other 
region variables. We do not allow substituting region variables with the null region (€) because 
that would convert a mutable reference type into a non-mutable reference type. This operation 
should only be performed when it is determined to be safe and is explicitly done using the 
close construct. 

TYPE SUBSTITUTIONS: wep onus {tot,...juoed,...,ror,..4 

Type substitutions are taken to be the identity mapping outside their specified finite domain. 

They also extend naturally over types, closure types, and type schemes, being applied to their 
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free variables in each case. For a type scheme o = Va,...a,.7, it may be necessary to rename 
some its bound variables a; so that they are OUT OF REACH for the type substitution y, @.e., 
no a; is in Dom(y) and no a; occurs free in any type, closure type or region in CoDom(y). 
Then, the substitution is defined by: 


p(Vay...Qn. 7) = Vay... p(T) 


The INSTANTIATION of a type scheme o = Va,...Q,. 7 to a type 7, written as 0 > 7, is 
defined if there exists a type substitution y with Dom(y) C {a,...a,} such that 7 = y(79). 

In order to simplify our notation for computing free and dangerous variables of sets of 
objects, we use the following convention: 


Notation 3.7 Given a set of objects P, 


The effect of type substitutions on the free and dangerous variables is now captured in the 
following proposition: 


Proposition 3.8 Let ~ be a type substitution. For any T, where T could be a type rT, a closure 
type 7, a region p, or a type scheme o, we have: 


F(p(L)) = 
F(P(PT))) SC Ply(T)) ¢ 


(Pp(F(L))) 
(PIPL) U P(p(F(L))) 


Proof: Both these relations follow directly from the definitions of F(Z) and D(7) by a simul- 
taneous structural induction over the appropriate type object 7. 


F 
F 


The first equation provides an exact relationship between the free variables of a type before 
and after applying a type substitution to it. On the other hand, the second pair of inequalities 
provide only an approximation to the set of dangerous variables of a type after applying a 
type substitution to it. This is so because the substitution images of dangerous variables of a 
type (F(y(P(r)))) may not cover all the dangerous variables of the substituted type (D(y(7))). 
Some non-dangerous variable may get substituted with a type containing dangerous variables 
that must also be counted as dangerous in the final type. 


Typing Rules 


Figure 3.2 shows the axioms and the inference rules for establishing elaboration judgments 
Et a:7. The const and the PRIMAPP rules establish the elaboration judgment for a constant 
or a primitive operator application according to a predefined relation typeof that provides the 
type scheme associated with them. All such predefined type schemes are fully-quantified: there 
are no free variables in these type schemes. Most constants and operators have the obvious type 
schemes. We only show the predefined type schemes of the three reference operators below: 


Vt,u,r.t (uj t ref(r) 
Vt,u,r.t ref(r) -upot 

Vt,u. t ref (€) Huot 

Vt,u,r. (€ ref(r),t) “uj}> unit 


typeof (ref) 

typeof ( ! mutable) 
typeof ( ! non-mutable) 
typeof (:=) 
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typeof (c) > 7 


CONST: Beers 
PRIMAPP: _typeof(op) 21 Atm ER ain 
E+ op(a) : T2 
« € Dom(E E(a)>T 
IDENT: BS MOM) PM aT 


Brae:t 


{y1---Yn} = F(f where f(x) =a) 
ABS: B+{fon—-(E(y),..-,E (Yn), mA t2,¢4 1} art 
EE (f where f(t) =a)?m1 (Pi). BQ) 


Eray:m (rm) Eb agit 


ae: BE ay dg: T2 
TUPLE: eae EF ant tn 
EF @y,..-, Gn i Tyee 5 Tn 
Bra, :T E+{aw7 Gen(F,7™1)}' a2: 7% 
LET: ET eM TS 2 
Eb (let & = a inag):7% 
CLOSE: Ebairref(r) 1 ¢ (FUE)UF(t)) 


EF (close a) :7 ref (e) 


Figure 3.2: The Static Semantics of the Kernel Expression Language. 


There are two different types for the dereference operator (!), one for mutable references 
and the other for non-mutable references. This is because we consider mutable reference types 
as distinct from non-mutable reference types. Essentially, we overload the use of the dereference 
operation with these two types. This does not create any problem since the exact type to be 
used is always clear from context. Moreover, in our kernel language, the underlying dynamic 
dereferencing operation is the same in both cases. 

The IDENT rule instantiates the type scheme of an identifier stored in the type environment. 
The ABs rule shows how closure types are created in this system. The type schemes of all the 
free identifiers of the function are stored in its closure type. This is necessary to keep track 
of the mutable locations accessible through the closure environment. The APP and the TUPLE 
rules are self explanatory. The App rule also handles primitive operator applications. 

The LET rule allows a type to be quantified and added to the type environment as a type 
scheme. The GENERALIZATION operation in the LET rule is defined as follows: 


Gen(E,7) =Vay...Q,.7 where {a ,...a,}= F(t) \ D(r)\ F(F) 


Finally, the CLOSE rule converts a mutable reference type into a non-mutable reference type 
by erasing its region variable and replacing it with the null region (¢€). This is an explicit type 
conversion operation on the mutable reference type. The side condition ensures the soundness of 
this operation by checking that the region being closed does not escape from the current scope. 
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This is the exact formalization of the informal closing strategies described in Section 2.4. 


3.2.3. Properties of the Typing Rules 


In this section, we will present some syntactic properties of the typing rules presented above. 
The most important property is the following proposition that states that typing is stable under 
type substitution. This property is essential for performing type inference (Section 3.4) because 
it guarantees that all incremental type refinements (via type substitutions) to a given typing 
of an expression yield legal typings of that expression. Thus, the typing of an expression can 
be automatically refined to match that of its enclosing context. 


Proposition 3.9 (Stability under Type Substitution) Leta be an expression, 7 be a type, 
E’ be a type environment, and p be a substitution. If Et a:7, then p(E)F a: y(t). 


Proof: by structural induction over a. For completeness, we show all the cases. 
Case 1: a is c— The CONST rule applies: 


typeof (c) > T 
FBre:f 


Let typeof (c) = Va, ...a,. T and ~ be its instantiation substitution such that 7 = ~(7) 
with Dom(y) C {a,...a,}. After renaming if necessary, assume that a; are out of reach 
for y. Now define a substitution y” with domain {a,...a,} such that ~’(a;) = y(w(a;)). 
Since the type scheme typeof(c) is assumed to be fully-quantified, there are no free 
variables in tT) other than the a;. Thus (to) = ¢(v(7)) = y(r), which implies that 
typeof (c) > y(r). The desired result follows using the CoNnsT rule. 
Case 2: a is op(a) — We proceed exactly like the previous case to show that typeof (op) > 
p(t1 -(m)> 72). The desired result follows from the induction hypothesis on the second 
antecedent. 


Case 3: a is « — The IDENT rule applies: 


z € Dom(£) E(«)>7T 
Bra:r 


Let F(a) = Vay ...a,. 7) and ~ be its instantiation substitution such that 7 = w(7o) with 
Dom(a) C {a,...a,}. After renaming if necessary, assume that a; are out of reach for y, 
so that p(E(x)) = Vay ...a,. p(T). Now define a substitution ~ with domain {a;...a,} 
such that ~’(a;) = y(w(a;)). We have, 


mae ) = v(a:) = v(wb(a)) Vi, since a; are out of reach of y 
v'(y(8)) = lV) 9((9)) VB a; 


Thus w’(y(70)) = 9(v(70)) = (7), which implies that p(E(x)) > y(r). This allows us to 
conclude y(F) F « : y(r) as desired. 


Case 4: ais (f where f(x) = a), (a1,...,@,), or (a1 42) — All these cases follow immediately 
using the induction hypothesis on their respective antecedents. 


Case 5: a is (let = a; in az) — The typing derivation ends in the LET rule: 


Bra, :T E+{aw Gen(F,71)}F ag: 72 
Er let x =a, ina: 72 
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By definition of generalization we have, 
Gen(E,7%) =Vay...Q,.7 and {a,...a,}=F(m1)\DP(n) \F(E) (3.19) 


Let $,...3, be new variables that are out of reach of ~ and are not free in F. Define a 
new substitution y’ = yo {a; 4 3;}. Using induction hypothesis we have, 

p(E) - a:¢'(n) (3.20) 

p(E) + {a y(Gen(F,71))} F a2: y(t) (3.21 


Since no a; is free in F’, we have y'(F) = y(F). Therefore, in order to apply the LET rule 
to the induction judgments 3.20 and 3.21 we need to show the following: 


p(Gen(E, 71) = Gen(y'(E), e(71)) (3.22) 


We show this in two steps. Define V = F(¢'(71)) \P(v/(n)) \ Fle'(£)). 
SubCase 5.1: {3,...8,} C V — We follow the definition of V given above. We have, 


1. B; € F(¢"(1)) — From Proposition 3.8 we obtain F(y'(71)) = F(y'(F(71))), and for 
a; € F(™) we have F(y’(a;)) = F(9;) = Bi. 

2. 8B; ¢ D(y'(m)) — From Proposition 3.8 we obtain D(y¢’(r1)) C Fly’(P(m1))) VU 
D(y'(F(1))). Now we have, 


e 2; € F(y'(P(m1))) — From Equation 3.19 a; ¢ D(74) and for all a F aj, §; ¢ 
F(¢'(a)) since 3; are chosen to be out of reach of y. 
e 8; ¢ D(y'(F(1))) — By definition D(y’(a;)) = D(G;) = ¢ and for all a £ aj, 
Bi € D(p"(a)). 
3. 8; € F(y/(L)) — From Proposition 3.8 we obtain F(y'(F)) = F(y¢'(F(£))). Now 

from Equation 3.19 a; ¢ F(E) and for all a 4 a;, 6; € F(y'(a)). 

SubCase 5.2: V C {(,...3,} — Suppose we have a § € F(y"(71)) such that 6 4 B;. We 
wish to show that 0 gV. 

From Proposition 3.8 we obtain 6 € F(y'(F(m1))). Let @ € F(t) be such that 
B€ F(¢'(a)). Now a F aj, otherwise 6 = F(y'(a;)) = F(G;) = B;. Using Equation 3.19 
we must have one of the following situations: 

1. a € D(t1) — This implies that 6 € F(y’(P(m1))) = 6 € D(¢'(m1)) using Proposi- 
tion 3.8. It follows from the definition of V that 6 ZV. 

2. a € F(E) — This implies that 6 € F(y'(F(E))) = 6 € F(¢'(E)) using Proposi- 
tion 3.8. Again, it follows from the definition of V that 6 ¢V. 


Combining the above two cases we obtain V = {,...3,}. Now we have, 


Gen(y'(E), ¢'(11)) VG1..- Bn. o' (m1) by definition of generalization 
= (Vay...Qp,.71) by substitution over type schemes 


e(Gen(E, T1)) 


This is the desired result of Equation 3.22, so the LET rule can now be applied on the 
induction hypotheses 3.20 and 3.21. 


Case 6: a is (close a) — The typing derivation ends in the CLOSE rule: 


E- close a:7 ref(€) 
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Just as in the last case, let r’ be a new region variable out of reach of y and not free in F 
or T. Define a new substitution y’ = po {rr r’}. Now we have, 

e(E) - a:¢'(r ref(r)) by induction hypothesis 
=> ep(E) Fa: (y(r)) ref(r’) since r ¢ (F(E) UF(r)) (3.23) 


It is also clear that r’ ¢ (F(y'(E)) U F(y'(r))) since r ¢ (F(E)U F(r)) and r’ was 
chosen to be out of reach of y. Therefore, we can apply the CLOSE rule to the induction 
hypothesis 3.23 to obtain the desired result. 


The following proposition states that a typing remains valid under a more general typing 
environment. 


Proposition 3.10 Let a be an expression, T be a type, and FE, E' be two typing environments 
such that Dom(E) = Dom(E"), and E'(x) > E(x) for all x free ina. If E+ a: 7, then 
E'ba:rt. 


Proof: by simple structural induction over a. The base case for the IDENT rule follows directly 
from hypothesis, since E’(z) > E(a) > 7. That is, any instance of H(2) is also an instance 
of E’(x). For the LET and CLOSE rules, we observe that F(L’) C F(£). In the LET rule, 
this implies that Gen(E’, 71) > Gen(E,7) and the result follows by applying the induction 
hypothesis to the second antecedent. For the CLOSE rule, this implies that r ¢@ F(E’) and 
the result follows directly. 


3.3. Type Soundness 


3.3.1 Semantic Model 


In order to show the soundness of the typing judgments generated by the above type system 
with respect to its evaluation rules, first, we must precisely characterize a “consistent” semantic 
relationship between value-domain entities and their corresponding type-domain entities. Since 
values may contain reference locations from the store, we need to define STORE TYPINGS which 
are finite mappings from store locations to types: 


STORE TYPINGS: Sonus {ly m,...,l, Tm} 


Note that we do not allow type schemes in store typings. This clearly separates the modeling 
of type generalization which is handled entirely via type environments, from the modeling of 
closing a mutable object which is handled entirely via type conversion within the store typing. 
Two store typings may be related by extension: 


Definition 3.11 A store typing S’ extends another store typing S if Dom(S) C Dom(S") and 
S(l) = S'(L) for alll € Dom(S). 


Now, we define the following consistency relationships between value-domain entities and 
type-domain entities: 


Definition 3.12 (Semantic Model) Let s be a store, S be a store typing, e be an environ- 
ment, EF be a type environment, v be a value, T be a type, and o be a type scheme. Define the 
following relations: 
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Case 1: S = u:7 — The value v belongs to the type T under the store typing S. The various 
cases are as follows: 
SubCase 1.1: S : c: typeof(c), where typeof is a predefined relation between predefined 
constants and their types. 
SubCase 1.2: S — (n-tup v1,..., Un)! (T1,---,Tn), if for alli, SE uj 37. 
SubCase 1.3: S — (clsr f,2,a,€) : 71 (mt) 72, if there exists a type environment EF) such 
that S Fe: E and Et (f where f(z) =a): 7] Xm) 7te. 
SubCase 1.4: SE 1:7 ref(r), if1 € Dom(S) and S(l) =7 ref(r). 
SubCase 1.5: 5 - 1:7 ref(e), if 1 € Dom(S) and there exists a substitution p with 
Dom(y) C F(S())) \ D(S()) such that p(S())) =7 ref(e). 
Case 2: S — vu: a — The value v belongs to the type scheme o = VYa,...Q,.7 under the 
store typing S, if none of a; belong to D(r) and if S Ev: v(t) for all substitutions ~ with 
Dom(y) C {a1... an}. 


Case 3: S =e: E — The values contained in the environment e belong to the corresponding 


type schemes in the type environment E’ (pointwise) under the store typing S, if Dom(e) = 
Dom(E) and for alla € Dom(E) we have S - e(x): E(x). 

Case 4: —& s: 5 — The values contained in the store s belong to the corresponding types in the 
store typing S (pointwise), if Dom(s) = Dom(S) and for alll € Dom(S) we have, 
SubCase 4.1: If S(lJ) =7 ref(r) then s(l) =v,rw and S Evu:rt. 

SubCase 4.2: If S(l) =7 ref(e) then s(l) = v,ro and S Fvu:rt. 


The primary motivation of “closing” a mutable object is to be able to generalize its type to 
a type scheme and use it like any other functional value in a safe manner. This is modeled in 
Case 1.5 by defining a closed location to be consistent with any type obtained via a substitution 
over the non-dangerous variables of the type present in the store typing. On the other hand 
in Case 1.4, a mutable location is defined to be consistent only with the exact type present 
in the store typing, modeling the fact that it is allowed to have only a monomorphic type. 
The one-to-one correspondence between dynamic mutability of a reference location and its type 
is reflected in Cases 4.1 and 4.2. Only the locations with a read/write tag are defined to be 
consistent with a mutable reference type and vice-versa. 


3.3.2 Properties of the Semantic Model 


During the course of evaluation of a program, the values contained within the store locations 
may change but the types of those locations remain the same (except for the types of locations 
that are currently being closed). This fact is useful in showing that a semantic relation such as 
SE=v:t that holds true at some point during evaluation, remains true afterwards under any 
extension of the current store typing: 


Proposition 3.13 (Store Typing Extension) /f 5’ extends S, then S Eu:r implies S’E 
vit. Similarly, Se: FE implies S’ Re: FE. 


Proof: by a simple induction over v. The only interesting case is that for locations. The 
definition of extension ensures that S and S’ must agree exactly on the types of the locations 
that are present in S. 


69 


3.3.3. Type Soundness 


Before we establish the consistency of the static and the dynamic semantics in terms of a 
soundness theorem, it is useful to characterize the semantic meaning of the generalization and 
the closing operations in terms of the above semantic definitions. 

The following proposition establishes the fact that it is semantically safe to generalize the 
non-dangerous variables of a type. 


Proposition 3.14 (Semantic Generalization) Letv be a value, rT be a type and S be a store 
typing such that S Evu:r. Let ay,...,m, be type variables such that for alli, a; ¢ D(r). Then, 
for all substitutions y with Dom(y) C {a1...am}, we have S Eu: y(t). As a consequence, 
SE vu: Vay... Qm.T- 


Proof: by structural induction over v. Only the case for a closed location is different from 

[Ler92], but we show all cases for the sake of completeness. 

Case 1: v is c — By definition, S EF c: typeof (c) and therefore we must have typeof(c) > Tr 
using the hypothesis S — c:7. Also by assumption, all predefined constants possess fully 
quantified type schemes, i.e., their type schemes do not contain any free type variables. 
This implies typeof (c) > y(r) and the result S — c: y(r) follows immediately. 


Case 2: v is (n-tup v1,-..,Un) and 7 is 71,..-,T — Since D(m1,.--, 7) = Urejen P(t); 
we must have for all 7,7 that a; ¢ 7;. By induction hypothesis it follows that for all j, 
Sv; : (7;). The result follows from the definition of — for tuples. 


Case 3: v is (clsr f,x,a,e) and 7 is tT, ~(7)> 72 — Applying the definition of — for closures, 
let F be the type environment such that, 


S — e:E (3.24) 
E F- (f where f(z) =a): 7] Ht (3.25) 
We will show that, 
S & e:9(F) (3.26) 
p(F) - (f where f(z) =a): p(m1 -—(m)> 72) (3.27) 


Equation 3.27 follows directly from Equation 3.25 using Proposition 3.9 that typing is 
stable under substitution. Also note that Dom(E) = Dom(e) = F(f where f(x) = a) 
from Equation 3.24 and the dynamic ABs rule in Figure 3.1. 

In order to show Equation 3.26, we must show S F— e(y) : y(E(y)) for all y € Dom(£). 
For a given y, let E(y) = VG... 8%. 7’ where §; are taken out of reach of y and distinct 
from a;. Using substitution over type schemes, we obtain y(E£(y)) = VG1..-8e-. p(t’). 
Thus, in order to conclude S F e(y) : y(E(y)), first we have to show S E e(y) : b(y(7’)) 
for any substitution ~ with Dom(a) C {1...3,}. This is done as follows. 

From Equation 3.24 we obtain S — e(y) : E(y) which implies S$ — e(y) : 7’ using the 
definition of F over type schemes. Now consider the substitution 70 y. Its domain is 
{ay,...,Qn,31,-.-, 8%}. We claim that none of these variables are dangerous in 7’: 


e a; ¢ P(r’) — We know that y € F(f where f(x) = a), so its type scheme E(y) 
is included in the closure type 7 of 7 = 7, -(mw)972. This implies that D(E(y)) = 
D(r') \ {G1...8,} is included in D(r) = D(r). Since a; ¢ D(r) by hypothesis, it 
follows that a; ¢ D(r’) for all 7. 
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e 3; g D(r') — S Fe(y) : E(y) from Equation 3.24 immediately implies 3; ¢ D(r’) for 
all 7. 


Now we can apply the induction hypothesis to the value e(y), the type 7’, the variables 
Qy,..-; Qn, G1,...,8, and the substitution wo y to obtain S & e(y) : o(y(r’)). This 
holds for any substitution w over {1 ...;,}. Moreover, none of 3; are dangerous in y(r’) 
since they are not dangerous in 7’ and they are out of reach of y. Therefore we obtain 
SE e(y) :VO,...8,%. 7’ by definition of F over type schemes, that is, S F e(y) : p(E(y)). 
This holds for all y € Dom(E£). Hence Equation 3.26 is satisfied and we obtain the desired 
result. 


Case 4: v is / and 7 is 7 ref(r) — Here, D(r) = F(r). Since no a; is dangerous in t by 
hypothesis, it follows that no a; can be free in 7. Thus, (7) = 7 and the result follows 
immediately from the hypothesis that SE u:r. 


Case 5: v is | and 7 is 7 ref(e) — Applying the definition of F for non-mutable locations, 
let ¢ be the substitution with Dom(w) C F(S(1))\ D(S(1)) such that 7 = %(S(J)) thereby 
implying y(t) = y(v(S()))). Also, no a; € Dom(y) is dangerous in S(/). Otherwise, 
it would surely be dangerous in 7 = ~%(S(J)) from Proposition 3.8 which contradicts the 
hypothesis. 

Consider the substitution yo ~ restricted to the domain X = F(S(1))\ D(S()). From 
the above remarks it is clear that we still have (pow) |x (S(/)) = y(r). Thus, we can 
apply the definition of E for the location / using the substitution (~o w) |x to conclude 
S 1: y(r) as desired. 


The following proposition establishes a correspondence between the dangerous regions of 
a type and the mutable locations that are reachable from a value possessing that type. This 
allows us to use dangerous regions as a safe static abstraction for mutable locations. 


Proposition 3.15 (Region Abstraction) Lets be a store, and S be a store typing such that 
Es:S. Then we have, 


SEvit => ( UJ nist) C R(t) 


1€ Reachable(v,s) 


That is, the dangerous regions contained in the types of reachable locations of a value are 
dangerous in the type of the value. Using pointwise extension to environments we also have, 


Ske:E=> ( UJ aust) C R(E) 


i€ Reachable(e,s) 


Proof: by induction on the depth of reachability of a location / in the value v. First, we define 
a family of reachability functions Reachable'(v, s) as follows: 


Reachable(v,s) = L(v) (3.28) 


Reachable'*" (v, s) Reachable'(v, s) UJ UJ L£(value(s(l))) (3.29) 


i¢ Reachable'(v,s) 
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By definition, Reachable(v, s) is the limit of the increasing chain of sets Reachable(v,s) C 
Reachable!(v,s) C --+. Since the number of locations reachable from a value is finite, this 
chain is guaranteed to reach the limit at a finite ¢. We will show that for all 2, 


UJ | R(S(1)) | C Rr) (3.30) 
l€ Reachable’ (v,s) 


Base Case: Using Equation 3.28, we need to show that for all locations | € £(v), we have 
R(S()) C R(r). This is shown by induction on the structure of v using the definition of 
Sever. 


Case 1: v is c — Trivial, since there are no locations reachable from a constant. 


Case 2: v is (n-tup v1,...,U,) and 7 is %],...,7, — Follows immediately from the defini- 
tion of E for tuples and the induction hypothesis for each 1;. 
Case 3: v is (clsr f,2,a,e) and T is tT) 4a)¥72 — From the definition of — for closures 


we obtain that there exists a type environment F such that, 


SEe:F and Et (f where f(x) =a): 1, Xr) (3.31) 


Applying the definition of reachability for (clsr f,a,a,e) and the induction hypothesis 
for environments we obtain: 


( U ais) = ( U ais) CRE) (3.32) 
(vs) 


l€ Reachable i€ Reachable(e,s) 


The desired result follows by noticing that R(£) C R(r) since Dom(E) = Dom(e) = 
F(f where f(x) =a) and all the type schemes in CoDom(£) are included in the closure 
type of r by construction. 


Case 4: v is J and 7 is 7% ref(r) — Follows immediately from the definition of — for 
mutable locations since S(/) = Tr. 
Case 5: v is | and 7 is 7 ref(e) — From the definition of — for non-mutable locations 


we have y(S(/)) =7. But, the domain of y does not include any dangerous variables of 
S(1), so we must have R(S(l)) C R(y(S()))) = Rr) as desired. 


Induction Case: We assume the hypothesis for 2, 


UJ | R(S(1)) | C R(t) (3.33) 
l€ Reachable’ (v,s) 


From Equation 3.29, the locations in Reachable'(v, ) are already covered via the above 
hypothesis. Given a location | € Reachable'(v, s), let value(s(1)) = v' and S(l) = 7' ref (p). 
Using hypothesis E s:$, we have S - vu’: 7’. Therefore, we can apply the base case in- 
duction as above and obtain for all l’ € £(v’), R(S(U)) C R(t’). This immediately extends 
to R(S(U’)) C R(r), since 7’ is contained in S(/) and R(S(J)) C R(r) from Equation 3.33. 


The semantic consistency between the static and the dynamic semantics can now be stated 
in the form of the soundness theorem given below. It is proved using induction on the size of 
evaluation derivation, doing a case analysis on a and hence on the last rule used in the typing 
derivation. 
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The soundness of the close operation relies on the fact that it only closes fresh and 
non-escaping locations, i.e., locations that are neither present in the initial store, nor are 
accessible from the current environment or the returned result. The former is a property of the 
dynamic rules (Proposition 3.5) and the latter is ensured by the side condition on the static 
CLOSE rule and Proposition 3.15. 


Theorem 3.16 (Type Soundness) Let a be an expression, T be a type, FE’ be a type envi- 
ronment, e be an evaluation environment, s be an initial store, and S be a store typing such 
that: 

Era:t and SEe:E and Es: 
If there exists a result r such that ee a/s > r, thenr # err, instead r = v/s' for some value 
v and a resulting store s'’, and there exists a store typing S’ such that: 


S' extends $ and S'Eu:t and Es’: S’ 


Proof: by induction on the size of evaluation derivation. We argue by case analysis on a and 
hence on the last rule used in the typing derivation. Again, only the case for the CLOSE rule 
is different from [Ler92], but we show all cases. 

Case 1: Constants — The typing rule is: 


typeof (c) > 
FBre:f 


The only possible evaluation is e + c/s > c/s. By definition of — for constants, we have 
S — c: typeof(c) which implies S — c:7 since typeof(c) > 7. We conclude with S’ = S. 


Case 2: Variables — The typing rule is: 


z € Dom(£) E(«)>7T 
Bra:r 


From hypothesis S — e: F it follows that « € Dom(e) and S — e(x) : E(z). Thus, 
the only possible evaluation is e F a/s > e(x)/s. By definition of F for type schemes, 
S — e(x): E(x) implies S — e(a) : 7. We conclude with S’=S. 

Case 3: Function Abstraction — The typing rule is: 


{Y1-+-Yn} = F(f where f(x) = a) 
EB+{foen thy)... En), te r,t mph a: 
Et (f where f(x) =a): 7 XE(y1),--+;E(Yn), 7) 72 
The only possible evaluation is e F (f where f(z) = a)/s => (clsr f,x,a,e |y)/s where 


Y = {y1..-yn}. Using the definition of E for closures, we have S | (clsr f,z,a,€ |y) : 
T -(m)+7T, taking F |y to be the desired type environment. We conclude with $’ = S. 


Case 4: Function Application — The typing rule is: 


BErayim Xn Er agit} 
Er a, a9 :7 


We claim that evaluations leading to err are not possible and that the following evaluation 
rule applies: 
et a, /s => (clsr f, 2, a0, €0)/S1 
eb ag/s1 > v2/s2 
eg t+ {f > (clsr f, 2, a0, €9), HH v2} ag/s. > v/s" 
OF (ay g)/s vfs 
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This is shown as follows: 

Using the induction hypothesis on a,, we obtain that it cannot evaluate to err, instead 
it must evaluate to a closure, e F a,/s => (clsr f, x, ao, €0)/51 with a store typing S$; such 
that: 


Si F (clsr f,2,40,€0) :T1 47) 7 and FF s,: 5; and S; extends $ (3.34) 


Since S; extends S$, we have S; E e : & from Proposition 3.13. Thus, we can use the 
induction hypothesis on ag with store s, : S$, and obtain that it evaluates to a proper value 
as well, ef a2/s, = v2/s2 with a store typing S2 such that: 


So K v2: 7 and E s9: So and So extends S (3.35) 


Applying the definition of = to the first clause in Equation 3.34, we obtain that there 
exists a type environment Eo such that: 


S, - e€9:£ (3.36) 
and Eo © (f where f(x) = a9): 71 -(m) > 12 
= ht+{fouimenrHm} - ao: (3.37) 


Now consider the following environments: 


eg = eg t+ {f + (clsr f, x, do, €0), > v2} and Fy = Eot{frmidnmyen,cn 1} 


Using Proposition 3.13 and Equations 3.34, 3.35, and 3.36, we obtain Sp — e2 : Fo. 
Therefore, we can apply the induction hypothesis to the typing judgment 3.37 and the 
store $2: S9. We obtain the evaluation e2 | a9/s2 > v/s’ with a store typing S$’ such that: 


SE uit and Es’: S” and S’ extends So (3.38) 


This shows that ao in the third premise of the evaluation rule given above also evaluates 
to a proper value and we obtain the desired result since S’ extends S$ by transitivity. 
Case 5: Tuple Construction — Same argument as above. 
Case 6: let-binding — The typing rule is: 
Bra, :T E+{aw Gen(F,71)}F ag: 72 


Er let x =a, ina: 72 


Again, we claim that evaluations leading to err are not possible and the last step in 
evaluation derivation is: 


ef ay/s => v1/s1 et {ar vy} ag/sy > v2/s’ 
eF (let « =a, in dz)/s > v2/s' 


This is shown as follows: 


Using the induction hypothesis on a,, we obtain that it does not evaluate to err, instead 
eb a,/s => v1/s, with the store typing S; such that: 


Spe uy: ty and -K 51: Sy and S, extends $ (3.39) 


Using Proposition 3.14, we have S; - v, : Gen(F,7) since the Gen operator does not 
generalize any dangerous variables in 7,. Now, consider the following environments: 


ep =et{rr vu} and By, =E4+{44 Gen(E,1)} 
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Since S$, extends S$, we obtain S$; — e; : Fy. Therefore, we can apply the induction 
hypothesis to the second premise of the typing rule and the store s; : $; to obtain e; F 
a2/s8, => v2/s' with the store typing $’ such that: 


S' v2: 72 and ar and S’ extends S$; (3.40) 


This is the desired result. 
Case 7: Reference Creation — The PRIMAPP typing rule instantiates to: 


Vt,u,r.t (uot ref(r) > 7TXn)7 ref (r) Ebra:r 
E} ref(a):7 ref(r) 


The evaluation must end up with: 


eba/s=>v/si lg Dom(s,) 
et ref(a)/s > I/(s1 + {14 v,rw}) 


By induction hypothesis applied to a, we obtain a store typing $;, such that: 
Sp Euit and LK 51:5} and S, extends S$ (3.41) 


Let us define, 
s'=s,+{lHv,rw} and S’= S$, 4+ {lH 7 ref (r)} 


Since Dom(s,) = Dom(S1), we have | ¢ Dom(S}). Hence, S’ extends S$; and therefore S. 
Using this fact on the first clause of Equation 3.41, we obtain S$’ E v: 7, which allows us 
to conclude from the definition of — that S’ EI: (7 ref(r)) and — s’: S$". 

Case 8: Dereferencing — We show the case for dereferencing a non-mutable location. The 
case of dereferencing a mutable location is similar. The PRIMAPP typing rule instantiates 


a Vt,u.t ref (€) Huot > 7 ref(e) (mor Eb a:r ref (e) 
EF ta:r 


By induction hypothesis applied to a, we obtain that it must evaluate to a location e F 
a/s = I/s, with a store typing S$, such that: 


Si El: 7 ref(e) and -K 51: Sy and S, extends $ (3.42) 


Also, 1 € Dom(s,) because the first clause above implies that 1 € Dom(S,) and Dom(s1) = 
Dom(S;1) from the second clause. Thus, the only possibility for evaluation is: 
eFa/s = I/s, Le Dom(s) value(s i(1)) =v 
eb ta/s > v/sy 


Applying the definition of = for non-mutable locations to the first clause in Equation 3.42, 
we obtain that there exists a substitution w with Dom(w) C F(Si(1)) \D(S1 (2) such that 
w(Si(1)) =7 ref(e). Thus, $)(J) must be of the form: 


Si(1) =7' ref (€) with wr’) =r 


This is because all locations must have reference types and we never substitute the null 
region for region variables. From the definition of F s, : 5; for location / it follows that 
5S; -u:t’. Since Dom(z) does not include any dangerous variables in S(/) and hence in 
rt’, we can apply Proposition 3.14 to substitution % and obtain S; Ev: ~(r’). This is the 
desired result taking S’ = S$. 
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Case 9: Assignment — The PRIMAPP typing rule instantiates to: 


Vt,u,r. (t ref (r), ) -(u)> unit > (7 ref (r),7) 4m) unit Era: ref(r),r 
Eb :=(a) : unit 


As in the previous case, the evaluation must end up with: 


eFa/s = (1,v)/s1 Le Dom(s,) tag(s, (1)) = rw 
eF :8(a)/s=> O/(s1 + {LF v, rw}) 


By induction hypothesis applied to a, we get a store typing $, such that: 
Si & (lv): 7 ref(r),7 and = 81: Sy and 5S; extends S (3.43) 


This implies Sj — v:7 and 5)(lJ) = 7 ref(r) from the definition of — for tuples and 
mutable locations. Letting S’ = S$; and s’ = s; + {I+ v,rw}, we therefore obtain  s’ : S’ 
using the definition of E. Finally, we check that S’ E () : unit and obtain the desired 
result. 


Case 10: close expression — The typing rule is: 


E- close a:7 ref (e) 


Using the induction hypothesis on a, we obtain e + a/s => [/s, with the store typing S 
such that: 


Si El: 7 ref(r) and Es; : 5} and S; extends S$ (3.44) 


From the first two clauses above and the definition of — for mutable locations, we obtain, 


Le Dom(S}) Si (1) =7 ref (r) si(f)=v,rw and Sy uit (3.45) 


Thus, the CLOSE evaluation rule applies: 


eFa/s= l/s, s(l) = v, rw 
L = Reachable(1, 51) U Reachable(e, 81) U Une Doms) Reachable(V, 51) 
et (close a)/s > I/(s; |x +{1 4 v, ro}) 


Let us now define, 
s' = 8; |, +{l 4 v, ro} and S'’= Si |_ +{l 4 7 ref(e)} (3.46) 
Now, we have to show the following: 


S' El:7 ref (e) and Es’: S and S’ extends S (3.47) 


The first clause follows directly from the definition of — for non-mutable locations since 
we have chosen | € Dom(S") and S’(1) =7 ref(e). 

Next, we show that S’ extends S. Note that S, extends S from Equation 3.44 and 
Dom(S) C Dom(S") by construction. Therefore, S’ will extend S if 1 ¢ Dom(S), since 
that is the only location at which S; and S’ differ. This is shown as follows. 
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Suppose for the moment that 1 € Dom(S). Since F s: S by hypothesis, we have 
l€ Dom(s). Applying Proposition 3.5 to the evaluation e F a/s = l/s, we conclude that 
l € Reachable(e,s). Also, since S$; extends S, we obtain that S(J) = S,(/) = 7 ref(r). 
Finally, using Proposition 3.15 for the hypothesis S E e: F, we must have r € R(S(I)) C 
R(E) C F(E) which contradicts the condition r ¢ F(£) in the typing rule. 


As the final step in proving Equation 3.47, we have to show — s:$’. By construction, 
we have Dom(s’) = Dom(S"’) and at location J, s’(/) has the read-only tag which is consis- 
tent with S’(1) pointing to a null region. At locations l’ € Dom(S’) other than /, the tags 
in s’ are already consistent with the corresponding regions in S’ since they are directly 
copied from s; : .S;. Next, we have to show that for all locations l’ € Dom(S’) such that 
value(s'(l')) = v' and S’(l') = 7' ref (p), we have, 


Sp Eur == So's! (3.48) 


This can be shown by a simple structural induction on v’. Only the case for locations 
is interesting. By construction, the store s’ is closed under reachability so there is no 
possibility of encountering undefined locations within v’, and for locations other than /, we 
already have S’(I’) = S,(U’). 

The only problem is if v’ contained / (the location being closed), then 7’ would still 
contain the region variable r because $;(/) = 7 ref(r). But this region has been closed in 
S’, making S’ E uv’: 7’ inconsistent. Thus, / should not be contained in v’. This is where 


the domain restriction on the store s’ proves useful. We show below a stronger condition 
that the location / is not reachable from any value v’ present in the store s’. Specifically, 
we will show that 1 ¢ Reachable(v',s,;) which implies | ¢ Reachable(v', s’). 


Let us assume for the purpose of contradiction that 1 € Reachable(v', s,). Looking at 
the components of Dom(s’) given by Equation 3.46, the following possibilities arise for 
l’ € Dom(s’): 


1. = | — Then v’ = v and hence | € Reachable(v,s,;) by assumption. Now, we 
apply Proposition 3.15 to S$; = uv: 7 taken from Equation 3.45 to conclude that 
r€ R(Si(2)) C R(t) C F(r) which contradicts the condition r ¢ F(r) in the typing 
rule. 

2. Al but l’ € Reachable(l,s;) — This immediately implies I’ € Reachable(v, 81) since 
the location / contains the value v in both s,; and s’. Together with the assumption 
1 € Reachable(v',s,) and transitivity of reachability, we obtain | € Reachable(v, s;) 
which leads to a contradiction as shown in the previous case. 

3. l' € Reachable(e, s,) — Using the assumption | € Reachable(v', s,) and the fact that 
l’ contains v’ in sy, we obtain by transitivity that | € Reachable(e,s,). Applying 
Proposition 3.15 to S; = e: & derived from Equation 3.44 and Proposition 3.13, we 
conclude that r € R(Si(1)) C R(E) C F(E) which contradicts the condition r ¢ F(E) 
in the typing rule. 

4. I’ € Reachable(Dom(s), s;) — We know that / was not reachable from any value present 
in the domain s initially, i.e., 1 ¢ Reachable(Dom(s),s) because we have already shown 
that 1 ¢ Dom(s) while showing that S$’ extends S$. Thus, the only way / could be- 
come reachable from Dom/(s) after the evaluation e F a/s => v/s, is if some location 
in Dom(s) was assigned a new value from which / was reachable. Without loss of 
generality, let us assume that location is /’ and the newly assigned value is v’, 7.e., 


dl’ € Dom(s):  value(s(U)) F value(si(U)) =v’ and 1 € Reachable(v',s1) (3.49) 
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Since location I’ was modified during the evaluation e + a/s = v/s ,, we can apply 
Proposition 3.6 to conclude that l’ € Reachable(e,s). Applying Proposition 3.15 to 
hypothesis S - e: EF, we obtain that R(S(l')) C R(E) which extends to R(Si(I/)) C 
R(E) since S$; extends S. 

On the other hand, from Equation 3.48 we already have $; — v’: 7’ where S)(l') = 
t' ref(r’) for some region variable r’. Applying Proposition 3.15 in this case for the 
location | € Reachable(v’, s;) we obtain r € R(Si(J)) C R(t’) C R(S1(U/)). Combining 
this with the result obtained in the last paragraph, we conclude r € R(F) C F(E) 
which contradicts the condition r ¢ F¥(£) in the typing rule. 


This proves that / is not contained in any value v’ present in the store s’ which implies 
that E s’: S’. Thus, all the clauses of claim 3.47 are true and we have the desired result. 


The soundness theorem immediately leads us to the following corollary that guarantees that 
closed reference locations are never updated. 


Corollary 3.17 (Non-Mutability of Closed Locations) Let a be an expression fragment 
within a type correct program p such that F + (close a):7 ref(e) and et (close a)/s > I/s'. 
Then, the location | is never updated during the evaluation of the rest of the program. 


Proof: The dynamic CLosE rule (Figure 3.1) ensures that tag(s’(/)) = ro. The AssIGN rule 
requires a rw tag for the location to be updated, and there is no other rule that converts the 
tag of a location from ro to rw. Thus, as long as the program p does not illegally attempt to 
update the location / and runs into a dynamic error, the location / cannot be modified. This 
condition is guaranteed by the soundness theorem since the program is well-typed. 


Corollary 3.17 may be generalized to arbitrary objects with a completely closed type. This 
allows us to conclude that mutable objects, once successfully closed, can no longer be modified 
and therefore behave functionally. 


Corollary 3.18 (Non-Mutability of Closed Objects) Leta be an expression fragment within 
a type correct program p such that EF} a:7 where R(t) = ¢ andet a/s > v/s’. Then, no 
location | € Reachable(v, s') is updated during the evaluation of the rest of the program. 


Proof: Using the soundness theorem we know that the evaluation of p (and hence a) does not 
lead to error and there exists a store typing S’ HE u:7 and — 8’: $’. We claim that for all 
locations 1 € Reachable(v, s') we must have tag(s‘(/)) = ro. Otherwise, from Definition 3.12 
Case 4 it follows that there exists a region variable r; such that S’(/) = 7 ref(r1). Then, 
using Proposition 3.15 it follows that r1 € R(r), which contradicts the hypothesis R(r) = ¢. 


Now, sound uses of the ASSIGN rule in Figure 3.1 require that the tag of the location being 
assigned should be rw. Furthermore, there is no rule that converts the tag of a location from 
ro to rw. Therefore, no assignments are possible on any location | € Reachable(v, s') during 
the evaluation of the rest of the program. 


Note that Corollary 3.17 is not a special case of Corollary 3.18 because Corollary 3.17 
guarantees the non-mutability of a single closed location even if the locations reachable from 
within it are mutable. On the other hand, Corollary 3.18 only deals with objects that have 
completely closed types in order to guarantee that none of the locations reachable from them 
are mutable. 
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3.4 Type Inference 


Finally, our type system admits a type inference algorithm Infer that infers principal types for 
expressions. This algorithm is a direct extension of the one described in Leroy’s thesis [Ler92] 
to region variables. We only need to ensure that region variables are allowed to be unified only 
with other region variables and never with the null region (€). This guarantees that we do not 
accidentally “close” a mutable reference type by unification. That operation should only be 
performed explicitly using the close construct. 

We will not discuss the details of the inference algorithm here since it is a trivial extension 
of that in [Ler92]. We only state the following propositions that characterize the soundness 
and the completeness of the inference algorithm with respect to the type system described in 
Section 3.2: 


Proposition 3.19 (Soundness of Type Inference) Let a be an expression and E be a type 
environment. If (7, ~) = Infer(a, F) is defined then we can derive p(E)Fa:r. 


Proposition 3.20 (Completeness of Type Inference) Let a be an expression and FE) be a 
type environment. If there exists a type r’ and a substitution y such that y'(F) Fa: 7’, 
then (7, ~) = Infer(a, F) is defined and there exists a substitution ~ such that 7’ = w(r) and 


g=poy. 


The proof of these proposition follows exactly as described in [Ler92]. 
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Chapter 4 


Closing Data-Structures 


So far, we have shown how to close a single mutable reference location. In this chapter, we show 
how to extend the use of the close construct to complex, multi-level data-structures involving 
tuples, arrays, and general algebraic datatypes. First, we discuss some alternatives for specifying 
the dynamic and static semantics of closing multiple locations and regions simultaneously in 
a multi-level data-structure. This leads us to devise a type-annotation based specification 
mechanism within the source language that permits the user to specify exactly which regions 
and their corresponding locations are to be closed. Next, we discuss the strategies for verifying 
the correctness of this scheme for arrays and general algebraic datatypes. We also briefly discuss 
how this work may be applied to conventional languages such as C, Pascal, or Fortran. Finally, 
we present the summary of Part I and directions for future work based on this research. 


4.1 Specification of “Close” for Multi-Level Data-Structures 


The static and dynamic CLOSE rules shown in Chapter 3 (Figures 3.2 and 3.1 respectively) only 
apply to a single mutable location being returned as the only result from an expression. These 
rules clearly need to be extended for the diverse range of data-structures available in a modern 
programming language. Id offers tuples, arrays, and general algebraic datatypes (including 
recursive datatypes), any of which could be implemented in an imperative manner and may 
need to be closed. Furthermore, the exact mutable locations to be closed may be embedded 
anywhere inside a complex, structured result returned from a computation. Therefore, we need 
a systematic way of closing structured results which involves the following tasks: 


1. Given an expression that returns a structured result, we need to specify which locations 
to close in the dynamic semantics, and the corresponding regions to close in the static 
semantics. 


2. We need to statically verify the soundness of the close operation by clearly identifying 
the scope of the imperative operations taking place on the locations being closed. 


As discussed in Section 2.3.1, treating the close construct as an encapsulator clearly delineates 
the scope of the imperative operations dynamically taking place on the returned result and it 
also statically identifies the type environment against which to verify the closing operation. In 
this section, we discuss the first issue of specifying the semantics of closing multiple locations 
and regions simultaneously within a structured result. 
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4.1.1 Dynamic Semantics Issues 


A simple and natural way to extend the dynamic semantics of the close construct to multi- 
level data-structures is to take the “all-or-nothing” approach. That is, closing an arbitrary 
data-structure recursively closes all its subcomponents and failure to close any one of the sub- 
components results in the failure to close the entire data-structure. This generalized semantics 
may be expressed in the following dynamic rule for the close construct: 


eba/s=> v/s) 
DYNAMIC-CLOSEL: {l,...l,} = Reachable(v, s1) si(Uj) =vj,rw L<i<n 
eF (close a)/s> v/(s1 + {li vj,ro}) L<i<n 


In the light of the remarks made in Section 2.4.2, we have to be careful not to close locations 
that are reachable from the enclosing environment. Otherwise, we would be able to write a 
universal closing function such as the closeall function shown below that would incorrectly 
close arbitrary mutable objects that are still being used imperatively: 


Example 4.1: 
def closeall x = close x; 
a = ref 1; 
b = closeall a; 
a i= 2; 4 Dynamic Write Error! 


Clearly, such functions should be disallowed because they create spurious dynamic “write- 
errors”, i.€., writing to a location that has been closed unintentionally. We would like to avoid 
such spurious errors or at least detect the possibility of creating such errors at the time of 
closing an object rather than at the time of using it. So we modify the DYNAMIC-CLOSE1 rule 
to reflect this strategy: 


eba/s=> v/s) 
{l,...U,} = Reachable(v, s1) \ Reachable(e, s1) 
si(j) =u,rw 1L<i<n 
eF (close a)/s> v/(s1 + {li vj,ro}) L<i<n 


DYNAMIC-CLOSE?: 


The above rule simply excludes all the locations reachable from the environment from being 
closed. This makes the closeall function of Example 4.1 behave like the identity function 
since no external location can now be closed. Alternately, we could introduce a side condition 
on the above rule to produce a dynamic “close-error” if any of the locations being closed was 
present in the environment. 

The above rule is still not entirely free of spurious write-errors. In the light of the remarks 
made in Section 2.4.4, we should not close locations that are captured within a function closure 
because such locations may be modified by the function. The following example illustrates this 
scenario: 


Example 4.2: 

g = close { b = ref 1; 
def fx={b := x; }; 
in f }; 


g 2; 4, Dynamic Write Error! 
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In the above example, an internal mutable location b is captured within a function closure f 
which is subsequently closed and returned. If the function body modifies the captured location 
(as it does here) then any application of the function would generate a spurious write-error. 
We can modify the DYNAMIC-CLOSE2 rule to omit closing such locations: 


eba/s=> v/s) 
{l,...l,} = Closable(v, s1) \ Reachable(e, s1) 
si(l;) =vj;,rw L<i<n 
eF (close a)/s> v/(s1 + {li vj,ro}) L<i<n 


DYNAMIC-CLOSE3S: 


The closable locations of a value v with respect to a store s, written Closable(v,s), are 
defined to be all the reachable locations from the given value except those that are reachable 
via an embedded function closure. A simple way to compute this set would be to modify the 
algorithm GATHER-LOCATIONS given in Section 3.1.2 to collect the locations reachable through 
a function closure at Line 8 in a separate set. This set would then be subtracted from the set 
of all reachable locations of a value to yield the set of closable locations of that value. 

The DYNAMIC-CLOSE3 rule given above seems fairly reasonable as far as the dynamic se- 
mantics of close is concerned for general, multi-level data-structures. 


4.1.2 Static Semantics Issues 


The static semantics for the DYNAMIC-CLOSES3 rule above could be given as follows: 


STATIC-CLOSEL: Beast {mths Or) \ FUE) Ft) 
Et (closea):{rjHe}r L<i<n 

The above rule erases only closable regions C(r) from the given type 7 which consists of 
the set of all dangerous region variables of the given type except those that occur within the 
closure type of a function. It also excludes all regions visible in the type environment. 

Although the rules DYNAMIC-CLOSE3 and STATIC-CLOSE1 seem plausible at first glance, 
unfortunately they cannot be shown to be sound with respect to each other. Intuitively, static 
semantics should provide a conservative approximation of what happens dynamically. As far 
as the close construct is concerned, this intuition is captured in Proposition 3.15 where we 
always maintain a correspondence between the reachable locations of a value and the visible 
regions variables in its type. Any semantics we give to the close construct must respect 
this correspondence, otherwise we will not be able to statically model the dynamics of closing 
an object properly. Unfortunately, the rules DYNAMIC-CLOSE3 and STATIC-CLOSEI1 do not 
correspond to each other in this respect. Consider the following example: 


Example 4.3: 

x = ref (ref 1); 

y = close { a = ref 2; Aavol 
b = ref 3; Zobel, 
c = if true then a else b; % ¢, and ly are region aliased. 
X := C3 % 1, escapes. 
in b }; 

y := 4; 4 Dynamic Write Error! 


In the above example, a and b point to two independent reference locations, say /, and 
lg. The conditional statement for c unifies the static region variables corresponding to these 
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locations, therefore /,; and /g become region aliased. This means that statically we cannot 
distinguish between these two locations. Assuming dynamically that the predicate resolves to 
true and c gets bound to /,, we export /; into the environment by storing it into an external 
location and attempt to close [y by returning it as a closed result. Dynamically, [y is not 
visible in the environment so the DYNAMIC-CLOSE3 rule would close it. On the other hand, 
statically there is no difference between [, and /y, and since /; is being exported, the static 
rule STATIC-CLOSE1 would not close the corresponding region variable creating a discrepancy 
between the static and the dynamic status of the location fg. This would ultimately lead to a 
write-error on the fy location as shown. 

Note that this write-error is generated not because the dynamic semantics for close was clos- 
ing a location inappropriately as was the case for rules DYNAMIC-CLOSE1 and DYNAMIC-CLOSE2. 
This error came about because the static semantics was not sufficiently powerful to model the 
dynamic semantics accurately. One way to solve this problem is to classify such write-errors 
as static “close-errors” by making the static semantics little more conservative. This can be 
accomplished by causing the static rule to fail when a region variable cannot be closed rather 
than ignoring it. The following rule embodies this idea: 


STATIC-CLOSE2: Bra:t {ri..-Tn} = C(r) rig F(E) l<i<n 
Et (closea):{rjHe}r L<i<n 

Using this rule, Example 4.3 would be classified as a static close-error and would be rejected, 
since an attempt was made to close a region (corresponding to locations /, and /2) which could 
not be statically verified for correctness. 

Unfortunately, the above rule still suffers from a rather technical problem that stems from 
our desire to perform type inference. It turns out that the above rule is not stable under type 
substitution (Proposition 3.9). In particular, the set of region variables C(y(7)) may turn out 
to be larger than the set y(C(r)) = {y(1)..-y(rn)} for a general substitution y. This implies 
that new closable region variables may get introduced into a type by substitution that may not 
have been properly verified for correct close semantics previously. 

Stability of substitution is used in showing semantic generalization (Proposition 3.14) as 
well as the soundness of type inference (Proposition 3.19). The former could be attributed 
to the specific style of relational semantics we have decided to follow in this thesis but the 
latter is fairly standard machinery in the literature and, if possible, we would like to retain 
it. Intuitively, failure of stability of substitution means that it may not be possible to show 
the soundness of a type inference algorithm based on this rule using standard unification and 
substitution machinery. 


4.1.3 Combining Type Generalization and Closing 


One way to devise a stable static rule for the close construct is to combine polymorphic 
generalization and object closing into a new language construct letclose x = a, in a that 
behaves exactly like let 2 = a, in a2 except that it erases all closable regions in the type of 
the expression a, and then immediately generalizes that type before binding the resulting type 
scheme to x. Intuitively, type generalization protects a typing derivation from later substitutions 
by quantifying its free type variables. Subsequent substitutions are then applied to polymorphic 
instantiations of the resulting type scheme which does not affect the original typing derivation. 
A possible dynamic and static semantics of the Lletclose construct is shown below: 
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eFa/s>vu/s;y {h...l,} = Closable(v, 51) \ Reachable(e, s;) 
s(ij) =v,rtw sp =s.4+ {lO vj;,ro} L<i<n 
et {aH v}F ag/s) > v2/82 
eF (letclose x = a; in dz)/s > v2/82 


DYNAMIC-LETCLOSE: 


Bra, :T E+{aH GenClose(E,71)} ag: 72 


STATIC-LETCLOSE: ~ 
EF letclose x = a, in ag: T. 


Where, 
{ri-.-Tr} = C(t) \F(E) 
To= {rpHe}r Lsi<n 
{ay...Qm} = F(r')\D(7')\ F(E) 
GenClose(E,7) = Vay...Qm. 7! 


These rules formalize what we have informally stated in the above paragraph. In this 
formulation, closing an object does not fail, instead, the definition of GenClose given above 
simply ignores such non-closable regions and does not generalize them. This property stems 
from the desire to keep type generalization as a non-failing property: if the type of an object 
cannot be generalized at a given scope, it is best left as a monomorphic type rather than flagging 
a “polymorphism-error”. 

Unfortunately, the above formulation suffers from the same region aliasing problem as dis- 
cussed earlier in the context of the STATIC-CLOSE1 rule. Dynamically closable locations may be 
aliased to statically non-closable regions, and this discrepancy is silently ignored in the above 
rules. We can fix this problem as in the case of rule STATIC-CLOSE2, by flagging a static close- 
error if we fail to close a region that we were expected to close. Unfortunately, this conflicts 
with the requirement of non-failing type generalization. 


4.1.4. Discussion 


We have seen above that the problem of devising a sound static and dynamic semantics for a 
close construct for multi-level data-structures and functions is sufficiently tricky and has many 
potentially conflicting requirements. This warrants a re-inspection of our approach towards this 
problem. 

Extending the static and dynamic semantics of a language to handle additional complexity 
and/or language constructs must fulfill the following requirements: 


1. The dynamic semantics of a new language construct should be able to accurately model 
what that construct is intended to do in a simple and intuitive manner. The semantics 
should also take into consideration what is efficiently implementable on a machine. This 
conflict among what we intend, what we can model, and what we can efficiently implement 
is very important to resolve in the design of a new language construct. 


2. Similarly, the static semantics machinery should be intuitive, efficiently implementable, 
internally stable, and externally consistent with respect to the dynamic semantics. The 
consistency requirement places a lot of constraints on the static machinery and it may 
not always yield the most general solutions. 


3. Finally, we should also pay attention to other requirements on the design of a new lan- 
guage construct such as simple and understandable syntax, type inference etc. that may 
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not directly affect its semantics or the efficiency of implementation but may affect its 
widespread acceptability as a useful construct. 


In the light of the above remarks, we have decided to abandon the search for a universal 
CLOSE rule. Below, we present our proposal for a family of CLOSE rules for closing a fixed set 
of regions and locations depending on the structure of the object at hand. 


4.1.5 Closing a Fixed Set of Regions/Locations 


The important point to realize is that closing a known set of locations that are characterized by 
a statically fixed set of region variables is perfectly sound. In the above examples, we ran into 
trouble when we tried to close an arbitrary set of locations for which we could not determine a 
statically fixed set of region variables. 

In some sense, closing only a fixed set of region variables at a time gives us more fine grain 
control over what locations are being closed dynamically. In order for this strategy to work 
with multi-level data-structures, the following requirements must be met: 


1. We need to specify statically which region variables we want to close. 


2. We should be able to verify the soundness of closing these region variables against the 
type environment and other region variables that have not been closed. 


3. The locations corresponding to the regions being closed must be similarly identifiable and 
closable in the dynamic semantics. 


4. Finally, all the locations and the regions being closed and those that are left aside must 
jointly satisfy the region abstraction Proposition 3.15, ¢.e., we cannot close a region vari- 
able statically without closing all its corresponding locations in the dynamic semantics 
and vice versa (region aliasing). 


The above requirements directly lead us to an approach where we do not have universal 
static and dynamic semantics rules for the close construct. Instead, we have an algorithm 
to synthesize an exact static and dynamic semantics rule for each multi-level data-structure 
pattern that we wish to close. This would give rise to a family of rules depending on the 
structure of object at hand and the particular set of locations we wish to close within that 
object. For example, closing a n-tuple consisting of n reference locations can be accomplished 
using the following rules (c.f. single reference CLOSE rules in Figures 3.1 and 3.2): 


et a/s => (n-tup ly ...0,)/s1 si(j)=u,rw l<i<n 
L = Reachable((n-tup [, ...l,), 81) U Reachable(e, s1)U 
Une Dom(s) Reachable(l’, 51) 
et (close a)/s => (n-tup ,...1,)/(s1 |p +{i; 6 vj,ro}) L<isn 


DYNAMIC-TUPCLOSE: 


Er a:(1 ref(ri)),---; (tn ref (rn)) 
STATIC-TUPCLOSE: ri € (F(E)UF (Mm) U++-UF(m)) Lin 
E' closea: (tr ref(€)),...,(™ ref(6)) 


Similar rules may be constructed for any subset of tuple fields containing reference values. 
Extending the above rules for closing tuples of references and vectors, we can easily handle the 
following example that combines their use in a non-standard way: 
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Example 4.4: 
def polar2rect n = 
close { xs = i_vector (1,n); 

ys = i_vector (1,n); 

rsum = ref 0.0; 

_= { for i <- 1 to n do 
rad,theta = ... some large computation ...; 
xs[i] = rad * sin theta; 
ysLi] = rad * cos theta; 
rsum := !rsum + rad; } 

in 'rsum/n, xs, ys }; 


Here, two vectors are closed and returned along with the accumulated average of a third 
quantity, all arising out of the same large shared computation. It is important not to repeat 
the computation and keep the storage space to a minimum. The use of an imperative style 
protected by the close construct makes the computation efficient and understandable without 
sacrificing overall functional behavior. 


Steps in Synthesizing cLosE Rules 


In general, given an arbitrary program expression a that returns a structured result, synthesizing 
a specialized static CLOSE rule involves the following steps: 


1. A group of region variables to be closed are identified from the type of the expression a 
using some appropriate language syntax. 


2. These region variables are then verified for soundness. This requires that none of these 
region variables should occur in the type environment and in the type of the closed result 
being returned. Furthermore, none of these region variables should occur inside the closure 
type of an embedded function type as pointed out earlier. 


3. If all the region variables pass the verification, they are erased from the type of the result, 
and the closed type is returned. Otherwise a static close-error is flagged. 


Similarly, synthesizing a specialized dynamic CLOSE rule involves the following steps: 


1. A group of locations to be closed is identified from the given value that correspond to the 
static region variables being closed. 


2. These location are verified for possessing the read/write tag within the current store. 
Otherwise, a dynamic close-error is raised. 


3. If all the locations pass the verification, their tags are flipped to read-only and the closed 
value is returned along with the current store with a slightly restricted domain as shown 
in Chapter 3 dropping any region-aliased handles to the locations being closed. 


4.1.6 Type Annotations as “Close” Specifications 


A simple way of specifying which regions to close in an arbitrary expression is to match it 
against a separate pattern and mark certain regions to be closed in that pattern. Note that 
this pattern matches the type of the expression and not its value. This is because several 
locations may be aliased to the same region variable by definition and we must close all of them 
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simultaneously. Then, it makes sense to specify them once using their type rather than specify 
each of the locations individually. 

A type pattern may be specified in a type annotation for the close expression as shown 
below: 


EXPRESSIONS: a — wae 
| (close 4) !! Tann close expression 


Here, the expression a would usually be a program block which returns a structured result. 
The annotation type Tan, would explicitly show the various type constructors present within 
the expression’s type along with their region parameters. The precise regions parameters to be 
closed are specified using the null region (€). The syntax used for specifying the annotation 
type is the full type grammar shown in Section 3.2.1 with the addition of a “don’t care” type 
pattern (_) that may be used in place of any type, region, or closure type expression within the 
annotation. The scope of the free type, region, and closure extension variables of the annotation 
type is taken to be that annotation itself; annotation types in different parts of the program do 
not share variables. 

Examples of this specification have already appeared in Chapter 2 within Examples 2.15, 
2.20, and 2.21. The static typing rule for such type-annotated expressions may now be given 
as follows: 


ANNOTE-CLOSE: Eb a:ting  {ri---tn} = (inf © Tann) TEE (F(E)UTann) Lin 
Eb (close @:: Tann) t Tann 
The type 7;,¢ stands for the inferred type of the expression a. The operation (Ting ~ Tann) 
matches the annotation type against the inferred type to determine the exact set of region 
variables being closed. Unlike the STATIC-CLOSE2 rule, this set remains stable under type 
substitution because the annotation type never changes. Below, we outline the mechanism of 
type and region matching and the subsequent verification of the close operation: 


1. The types Tinf and Tan, must match exactly! except that some region variables in Ting May 
be closed in Tan». For each parameterized type constructor T'(p,...p,) the number of 
regions in the inferred and annotated type must also match. For syntactic convenience, 
we may allow a parameterized type constructor to appear without any region parameters 
in the given annotation, in which case all its region parameters are assumed to be the 
null region. 


2. Each inferred region parameter is positionally matched with the corresponding annotated 
region parameter in order to determine the precise set of region variables being closed: 


e A null region in the inferred type must match a null region in the annotation type. 
These represent previously closed regions that cannot be opened again. 


e A region variable r in the inferred type matches a null region in the annotation type 
and is considered as being closed unless it occurs within the closure type of a function 
(Section 2.4.4). In the latter case, a static close-error is flagged. 


e A region variable r in the inferred type also matches a region variable r’ in the anno- 
tation type as long as all occurrences of r in the inferred type match the same region 


'Bach occurrence of the “don’t care” type pattern (_) within the annotation type is always assumed to match 
the corresponding type, region, or closure type expression present in the inferred type. 


88 


variable r’ in the annotation type. For convenience, we may allow this matching 
to behave like a region variable constraint on the inferred region parameters rather 
than a mere renaming of variables. A unification substitution {r 4 r’} may need to 
be generated in this case. 


3. Finally, all region variables determined as being closed are collected in a set taking region 
variable constraints and variable renaming into account. This set of region variables, say 
{r,...1r,}, can then be verified for soundness as shown in the above rule ANNOTE-CLOSE. 
Checking that no region variable r; being closed appears anywhere within the current 
type environment F or within the annotation type Ta,, ensures that the corresponding 
closed locations are not reachable from the dynamic environment or the returned value. 
This is similar in spirit to the simple CLOSE rule shown in Figure 3.2. 


The above scheme achieves both our original goals of specifying the regions to be closed 
and pinpointing the type environment to verify them against with a single, familiar language 
construct. Moreover, it specifies multiple regions to be closed at various levels of a struc- 
tured result simultaneously, and it does this without adding additional semantic or syntactic 
complexity than was already present in the kernel language of Chapter 3. 

This scheme also identifies the dynamic locations to be closed quite easily. The structure 
of tuple types directly reflects the structure of the tuples themselves. Therefore, the static 
distribution of regions variables to be closed within a structured type annotation directly leads 
us to the locations that need to be closed in the corresponding structured result. Locations 
within embedded function closures must never be closed, which is why the corresponding region 
variables are caught and flagged as a static close-error. 

In the next two sections, we describe the semantics and close specification for arrays and 
general algebraic datatypes based on the above strategy. 


4.2 Closing Arrays 


4.2.1. Dynamic Semantics 


We can easily generalize a single mutable reference location introduced in Chapter 3 to an array 
of indexed locations all of which belong to the same region. In fact, the ref construct may be 
viewed as a special case of a 1-dimensional array with length 1. Indexed locations effectively 
model consecutive memory addresses on which index computations may be performed, although 
the starting location of the array would still remain abstract. This treatment of locations is 
a little more concrete than that in Chapter 3 where every location was considered to be an 
independent abstract label. 

We represent a 1-dimensional array as a pair (vect /,n) giving the starting location / and 
its length as a positive integer literal n. These are added to the set of dynamic values: 


VALUES: Uo... 
| (vect J, n) vector of length n 


The values associated with the slots 0 < i < n of a vector (vect /,n) are stored at the 
locations /,...,/-++n— 1 within the store s. All these locations are assumed to be directly 
accessible from the vector value: 


L£((vect [,n)) = {l,...,8 +n— 1} 
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VECT-ALLOC: eba/s=> n/sy (+7) € Dom(s;) 0O<i<n 
: : eF allocvect(a)/s = (vect /,n)/(syt+{/+tirn L,rw}) 0<i<n 
et ay/s => (vect I, n)/s1 ef ag/s, > i/s2 
VECT-DEREF: (+2) € Dom(s2) value (so(1+7)) =v 

et ay[ag]/s > v/s2 


eF ay/s => (vectl,n)/sy eF ag/s1 >t/s2 eb a3/s2 > v/s3 
VECT-ASSIGN: (+2) € Dom(ss3) tag(s3(1 + %)) = rw 
eF (ay[a2] = a3)/s > O/(s3+ {+74 v, rw}) 


eFa/s => (vect I, n)/s1 si(I+i)=u,rw O0<i<n 
L = Reachable((vect |, n), s,) U Reachable(e, s1)U 
Une Dom(s) Reachable(!’, s1) 
et (close a)/s => (vect I,n)/(s1 |p +{§ +24 v;,ro}) O<i<n 


VECT-CLOSE: 


Figure 4.1: Dynamic Semantics of Arrays. 


We also extend reachability (Definition 3.2) for vector values: 


Reachable ((vect [, n), s) @ | ¢ Dom(s) 
Reachable((vect I,n),s) = L((vect 1, n)) U Up<ien Reachable(value(s(1+ %)),s) Otherwise 


The algorithm GATHER-LOCATIONS is correspondingly extended to collect such locations. 

Figure 4.1 shows the dynamic semantics rules for 1-dimensional arrays. These are straight- 
forward generalization of the corresponding rules for the ref construct. The primitive opera- 
tor rules for vector allocation (allocvect), vector dereference (a[i]), and vector assignment 
(aLil=v) operate as expected. During vector allocation, n fresh locations are added to the 
domain of the store each of which is initialized to a special “undefined” constant (L).? The 
domain validity test in dereference and assignment rules simulates bounds checking because only 
the indices within the bounds /---1+n— 1 would be present within the domain of the store 
for a given vector value (vect /,n). Finally, the VECT-CLOSE rule closes all the locations of the 
vector simultaneously. 

Multi-dimensional arrays may be modeled in a similar fashion or may be linearized into 1- 
dimensional arrays. In the latter case, the linearized vector value may need to keep additional 
information to translate a multi-dimensional index into a linearized index. 


4.2.2 Static Semantics 


Since arrays are considered to be homogeneous data-structures, all values contained in it must 
have the same type and all its locations must belong to the same region. This means that a 
single region variable suffices to represent the imperative properties of the array. Therefore, a 
mutable vector containing values of type 7 is typed as (7 vector(r)) just like a mutable reference 
type (7 ref(r)). The free and dangerous variables of the vector type are also computed just like 
those for a reference type. 


?This formulation is useful for synchronized arrays (I-structures and M-structures); conventional unsynchro- 
nized arrays as shown here may in fact be initialized with any constant of the appropriate type. 
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The types of the primitive array operators are shown below: 


typeof(allocvect) = Vt,u,r. int Xu)>t vector(r) 
typeof (LI mutatie) = Vt,u,r. (t vector(r), int) Xupot 
typeof (LJ non- matte) = Vt,u. (t vector(e), int) (uot 
typeof(-[-J=_) = Vt,u,r. (t vector(r), int,t) 4u) unit 


The static semantics rule for closing arrays operates exactly like that for the ref construct 
and is shown below: 


Et a:7 vector(r) r¢ (F(E)UF(r)) 


VECT-CLOSE: 
E+} closea:7 vector(e) 


All the proofs for the ref construct given in Sections 3.1 and 3.2 extend naturally to arrays 
since all the locations contained within an array are simply an extension of its starting location 
i. We never create “internal” pointers into the middle of an array and operate on individual 
locations of the array. For instance, all indexed references on vectors operate on the value 
(vect /,n) and an index offset i, / +7 by itself is not taken to be a valid value. For the purpose 
of reachability, this ensures that all locations of an array are always taken together in a group 
which is similar in spirit to the treatment of the ref construct. 


4.2.3. Semantic Model and Soundness 


The store typing S carries the type (7 vector(p)) at every location of the vector just like it 
carries the full reference type at a ref allocated location. Thus, we can extend the semantic 
model (Definition 3.12) in the obvious manner: 


Definition 4.1 (Extended Semantic Model) Lets be a store, S be a store typing, e be an 
environment, EF be a type environment, v be a value, T be a type, and o be a type scheme. 
Define the following relations: 
Case 1: SEu:t—... 
SubCase 1.6: S | (vect [,n): 7 vector(r), if (1+12) € Dom(S) and S(l+ i) = 7 vector(r) 
for lO <i<n. 
SubCase 1.7: S | (vect l[,n) : 7 vector(e), if (1+%) € Dom(S) and S(1+%) = 7’ for all 
0<i<n. Furthermore, there exists a substitution ~ with Dom(y) C F(t’) \ D(t’) such 
that p(t") = 7 ref(e). 
Case 4: Es: 5 — 
SubCase 4.3: If s() =T vector(r) then s(l) = v,rw and S Evu:t 
SubCase 4.4: If S(l) =7 vector(e) then s(l) =v,10 and S Eu:t 


Proofs for semantic soundness from Section 3.3 also extend naturally to vectors using this 
extended semantic model. A simple reference value / is replaced by a vector value (vect /, n) 
and statements about the store typing of that location S(/) are replaced by those applying to 
the group of locations S(/+7) for all 0 < i < n. Proofs that do not directly depend on structure 
of values or of evaluation rules such as the region abstraction Proposition 3.15 do not change 
at all. 

The above machinery allows us to finally answer the problem we posed at the beginning of 
Section 2.1 about implementing functional arrays in Id. The solution proposed in Section 2.3 
for implementing function make_vector (Example 2.13) can now be automatically verified for 
correctness by the type system and is reproduced below: 
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Example 4.5: 
i_vector 2 Vt, u,r.(int, int) -(u)> (t vector(r)) 
make_vector :: Vt, u.(int Xu)>t) — (int, int) “(int Xu} t) (t vector (€)) 
def make_vector f (1,u) = 
close { a = i_vector (1,u); 
_= { for i <- 1 to u do 
ali] = f i }; 


in a }; 


The i_vector primitive allocates an empty vector between bounds (J, w) and initializes it to 
contain the “undefined” value (L) everywhere. The region variable in the type of the allocated 
vector shows that it is assignable. On the other hand, the null region (e€) in the type of the 
returned vector from make_vector shows that it has been safely closed into a functional vector. 


4.2.4 Modeling I-Structure and M-Structure Arrays 


Readers may have noticed that the above description only presents unsynchronized mutable 
arrays that are closed into unsynchronized functional arrays. A few words are appropriate here 
regarding the modeling of synchronized (I-structure and M-structure) arrays present in Id. 

As discussed in Section 2.3.5, a mutable array may be implemented using any one of the 
three underlying memory access protocols: unsynchronized, I-structure, or M-structure (refer 
Figure 2.1). Similarly, a functional array may be implemented using one of the two protocols: 
unsynchronized, or I-structure. However, the static typing machinery presented above allows us 
to only distinguish between a single mutable vector type vector(r) and its corresponding func- 
tional vector type vector(e). It does not matter which underlying protocol each type represents 
as long as we use the appropriate kind of barrier during the close operation (see Section 2.3.5), 
and that objects belonging to the two types are represented in the same way. The latter con- 
dition is required so that the close construct can simply change the view of an object from 
mutable to functional without requiring any data layout conversion. 

In a conventional language such as C or Fortran, with only one kind of memory access proto- 
col (unsynchronized), the simple two-way classification described above is sufficient. However, 
in Id we use two memory access protocols: I-structure and M-structure, giving rise to two types 
of assignable arrays and one type of functional arrays. Since, in Id functional objects are also 
implemented using I-structures, it is natural to use the [structure protocol for objects with 
either the assignable type vector(r) or the functional type vector(e). This way, the underlying 
data layout is guaranteed to be the same in the two cases and no barrier is needed during the 
corresponding close operation. This leaves us with the question of how to type M-structure 
arrays and close them into functional arrays. Below, we discuss some possibilities. 

One possibility is to assign M-structure arrays a separate mutable type constructor, say 
m_vector(r), and then somehow convert the type constructor m_vector into vector when closing. 
Semantically, this is not very clean because it requires an additional type conversion during the 
close operation. Moreover, this scheme does not express the language constraint that the 
layout of M-structure and functional objects is expected be the same. That constraint is buried 
under the semantics of the type conversion operation from M-structure objects to functional 
objects, which is left unspecified. Unsuspecting compiler writers may choose different data 
representations for M-structure and functional objects which would make the close operation 
on M-structure objects incorrect (or extremely inefficient). 

Another possibility is to expand our region algebra to accommodate two different kinds 
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of mutable objects: I-structure and M-structure. This is easily accomplished by using two 
kinds of region variables: r*? denoting I-structure regions, and r™ denoting M-structure regions. 
No implicit conversions would be allowed between the two kinds of region variables via type 
substitution or instantiation. The close construct would be used to explicitly close either kind 
of region variable into a null region. It is easy to see that all the semantic machinery presented 
in Chapter 3 would extend trivially to this scheme. 

Under this scheme, a single parameterized type constructor may be used to denote all 
three kinds of arrays: vector(r™) for M-structure arrays, vector(r’) for I-structure arrays, and 
vector(e) for functional arrays. The uniform type constructor used in all cases denotes the 
language constraint that the underlying data layout should be the same in all three cases. This 
scheme clearly separates the semantic modeling of the layout of an object which is denoted by 
its type constructor, from the modeling of its mutability and synchronization properties which 
is denoted by its region parameters. 

It is easy to see that the region algebra may be enriched even further in order to accom- 
modate unsynchronized objects within the same framework. This ability provides a natural 
extension to our type system when adding unsynchronized objects to Id, or adding I-structure 
and M-structure objects to conventional languages such as C or Fortran. 


4.3. Closing General Algebraic Datatypes 


4.3.1 Specification Issues 


General algebraic datatypes introduce yet another dimension in the syntactic specification of 
closable regions and locations. In this section, we informally present some of the issues via 
examples that are formalized in later sections. 


Multiple Region Parameters 


Consider the functional list datatype declaration shown below: 


Example 4.6: 
type list ¢ = nil | cons ¢ (list #); 


There are two fields in the cons constructor, either or both of them could be made mutable 
and closed independently. When a field of a datatype becomes mutable, it has to be tagged 
with a region variable which is reflected in the datatype constructor as a region parameter (e.g., 
the type constructors ref(p), or vector(p)). There is some flexibility in deciding whether to add 
additional region parameters to a type constructor for each mutable field or tag several mutable 
fields with the same region variable. 

One possibility is to always require the user to specify the distribution of region parameters 
explicitly. On the other hand, it may be possible for the compiler to automatically add the 
region parameters to a mutable datatype declaration according to some fixed strategy. The 
question of whether two mutable fields should be modeled using the same region variable or not 
depends on how the fields are manipulated and closed within the rest of the program, although 
a fixed, compile-time heuristic is probably more desirable. For instance, the compiler could 
simply assign a single region variable per datatype or it could determine the largest independent 
set of region variables that would characterize a given datatype, subject to recursive typing 
constraints. Thus, either of the following declarations for mutable lists would be acceptable, 
although each provides a different degree of flexibility and approximation: 
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Example 4.7: 
type list(r) t = nil | cons (r)!¢ (r)!Qlist(r) ¢); 


type list(ri,r2) tf = nil | cons (rj)!t (rg)! Clist(r,r2) 6); 


In the above declarations, we have prefixed a region variable to each of the mutable fields.° 
The first declaration identifies the entire spine of the list with the same region, while the second 
declaration classifies heads and tails separately. Whether the first or the second declaration 
should be used depends on whether we wish to close heads of a list without closing the tails 
or vice-versa. In general, it is useful to have as much flexibility as possible, especially if the 
heads and tails employed different memory synchronization protocols (see Section 2.3.5), so the 
second declaration appears to be a better choice. However, note that both fields share the same 
type variable (tf), so we will not be able to generalize objects of this list type unless both regions 
are closed. Therefore, if we are only concerned about converting mutable lists to completely 
functional lists, then collapsing the two regions into a single one may be more desirable since 
it simplifies the datatype representation. 


Inherited Region Parameters 


Embedded parameterized types within another algebraic datatype forces the type constructor 
being defined to inherit the region parameters of the embedded type, otherwise there would be 
no way to generalize such region variables in a Hindley/Milner type system. For example: 


Example 4.8: 
type keyref(r) t = mkkeyref I (ref(r) £); 


Although, none of the fields of the type keyref itself is mutable, it still must inherit the region 
parameter r of the embedded type ref, otherwise this parameter could never be generalized and 
would always point to the same region. This information can easily be taken into account within 
the compiler while computing the region parameters of a datatype declaration automatically. 


Closure Type Parameters 


An interesting problem occurs with general algebraic datatypes that may hide function closures 
inside them. The closure typing system described in Section 2.2.6 works well with higher-order 
functions since we have a way of expressing, propagating, and generalizing over closure types 
directly as they are defined while typing a A-abstraction or instantiated at a function reference. 
But, if a function is carried indirectly by storing it within a data-structure, we must still not lose 
its closure typing information because of such indirection. Otherwise, write handles embedded 
inside such functions could escape undetected. To illustrate this subtle point, consider the 
following example: 


Example 4.9: 
type capture to = capt (int ~(uo)> to -(u1)> to); 


def escape_5 n = 4 escaped :: Vtg.int 3 (vector to, capture to) 
close { a = i_vector (1,n); 
def give 4 g 2 Vugu3.int (uz) tp -(vector(r) to, uz) to 


{ ali] =v; in v }; 


°A dot (.) in front of a field denotes that it is an [structure field, while a bang (!) denotes that it is an 
M-structure field. 
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in a, capt g } :: (vector _),-; 


As shown above, the datatype capture has a single type parameter to and its sole con- 
structor capt stores a polymorphic function closure. It is necessary to parameterize this type 
with the closure extension variables ug, u, of the hidden function type so that these variables 
can also participate in type generalization and close verification. With the declaration shown 
above, the type system will be unable to detect that a write handle to the array a being closed is 
escaping via a function closure since that function is hidden inside a data-structure. We should 
point out that this parameterization is necessary for the closure typing system itself to work 
properly, this is not specifically related to the close construct. Without such parameterizations 
one would be able to /aunder functions with complicated closure types by simply storing them 
into a data-structure and then fetching them back. The correct declaration for the datatype 
capture is shown below with additional closure type parameters: 


Example 4.10: 
type capture to uo uw = capt (int (up) to Xu) to); 


The closure type parameters on datatypes behaves exactly like closure extension variables 
within closure types of function. For example, the type of capt g now instantiates the extension 
variable wu of the datatype capture with the closure type (vector(r) to, u3) of the function g, 
thereby exposing the hidden region r embedded within the closure type. This would allow the 
subsequent close verification process to flag the escaping region as a static close-error. 


4.3.2 Syntactic Specification of Algebraic Datatypes 


Now, we are ready to show the full machinery for the specification of general algebraic datatypes. 
A general algebraic datatype declaration is shown below: 


type T(rii) tj Mok = Cr (prs)tin-e-  (Ptay) Tia, 


| Cm (Pmi)Tm1 ce (Pmam)Tmam 


This declares a type constructor 7’, with r;...r; as region variable parameters, ¢; ...t; as 
type variable parameters, and u,...uz as closure extension variable parameters. This datatype 
has m constructor disjuncts C ,...C, each with its own arity a,...@,, any of which could be 
zero. Each field of a non-nullary constructor C, has an independent type Tp, and a region 
eXpression Pp. The type Tpg may use region, type, and closure variables from the declared 
parameters of the datatype 7. The region expression Pp, either consists of exactly one region 
variable parameter denoting that this field is mutable or it is the null region € denoting that 
this field is functional.* 

The above declaration may be supplied by the user, or the compiler may automatically 
augment an ordinary datatype declaration containing only type variable parameters with addi- 
tional region and closure extension parameters. In order to do so, the user must at least specify 
which fields are expected to be mutable and which ones are functional. Then, a maximally 
independent set of region variable parameters and a set of closure extension parameters may be 
computed for each datatype T declared within the program using the following steps: 


1. First we assign region expressions Pra to each field of each datatype 7 declared within the 


4 Additional syntax may be used to distinguish between I-structure and M-structure fields. 
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program as follows: 


T _} Ypq Tpq is new and the pq-th field in the datatype T is mutable 
Poa ) € — Otherwise 
. «ogee . . TT _ T 
Each datatype T is initially assigned the region parameters R° = Up,, and the closure 
extension parameters U? = U Closure- Variables(7.). 


2. Now we construct a datatype reference graph consisting of all the datatypes declared 
within the program, where there is an edge from a datatype 7, to another datatype 
T, if Tz occurs within some field type Tp, of T;. We partition the nodes of this graph 
into strongly connected components (SCC) [AHU74] according to this (directed) edge 
criterion. This puts mutually recursive datatypes into the same component. We will use 
this information to assign the same region and closure extension parameters to mutually 
recursive datatypes. 


3. Now, proceeding in a topologically bottom-up fashion on each SCC of the above refer- 
ence graph, we compute the final set of region and closure extension parameters for each 
datatype as follows. If two datatypes 7, and 7) belong to the same SCC, then all occur- 
rences of one inside the other use the same variables. If 7, refers to T> and they belong 
to different SCCs, then for each occurrence of 72 within the declaration of 7, we rename 
the parameters associated with Tz (R?2 and U?2) to fresh variables and recompute the 
parameters of 7, (R™ and U"). 


4, Finally, each datatype T within the same SCC is assigned the region parameters Urescoc RT 
and the closure extension parameters UpegccU'. 


Intuitively, the above algorithm assigns a new region variable to each statically distinguish- 
able mutable field keeping track of inherited and recursive regions. In this sense, it computes a 
maximally independent set of region variables for each datatype. For example, this algorithm 
would automatically compute the region assignment (list(r1,r2) t) shown in Example 4.7 for 
the following type declaration which specifies both heads and tails as being mutable: 


Example 4.11: 
type list ¢ = nil | cons '¢ !(list 1); 


4.3.3. Dynamic Semantics 


Dynamically, each constructor disjunct Cp gives rise to a value (C, v1 ---Ua,) where C, denotes 
a tag that identifies the disjunct and v,---v,, are its field values. The value corresponding 
to a mutable field is a unique location /,, whose contents are accessible through the store. 
This generalized representation subsumes the functional n-tuples ((n-tup v1,...,U,)) and single 
mutable reference cells (/) used in Chapter 3 because it permits individual locations of a tuple 
itself to be mutable. In order to avoid confusion, we now represent individual mutable reference 
cells such as those used in Chapter 3 using the following datatype declaration: 


Example 4.12: 
type ref(r) t = ref (r)!t; 
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A mutable reference cell would now be represented as (ref /) instead of a bare location / 
which by itself is no longer considered to be a proper value and may only appear as a mutable 
field value within a constructor value.° 

The locations directly contained in a constructor value £((Cp v1 +--+ va,)) are naturally de- 
fined to be the set of field values that are locations. Similarly, the reachable locations of 
a constructor value (with respect to a store s) are the set of locations directly or indirectly 
reachable from all the fields of the constructor. 

The primitive operations of allocation, dereference, and assignment extend naturally to con- 
structor disjuncts and their embedded mutable and non-mutable fields. The reader is referred 
to [Nik91] for details of the exact syntax used in Id. The dynamic semantics of these operations 
is given by a family of allocation, dereference, and assignment rules on the lines of those shown 
for reference cells in Chapter 3. 

The dynamic semantics of closing a constructor value follows the discussion in Section 4.1. 
The main problem is to identify the set of dynamic locations to match the specified region 
variables that are being closed in a general algebraic datatype. For non-recursive datatypes, 
the locations to be closed are exactly those carried directly within the constructor value at the 
field position corresponding to the region variable being closed. As an example, we reproduce 
the point datatype from Example 2.17 below with explicit region parameters. Both fields of 
the point pti are closed while only the second field of pt2 is closed: 


Example 4.13: 
type point(rj,r2) = pt (r1)! float (rz)! float; 


pti = close (pt 1.2 3.5) :: point; 4 Abbreviation for point(e, €) 
pt2 = close (pt 2.2 4.7) :: point(_,«); 


For recursive datatypes, the value contained within each field that recursively refers to the 
region variable being closed must also be traversed and closed. Consider the following example 
using mutable lists: 


Example 4.14: 
type list(ri,r2) tf = nil | cons (rj)!t (rg)! Clist(r,r2) 6); 


11 = close (1:2:3:4:nil) :: (list(e¢,_) int); 


The dynamic implication of closing the first region parameter r, of the list 11 is that all 
head fields on the spine of the list get closed, although the tail fields still remain mutable (since 
rz is not closed). This is because after closing the head field of the first cons-cell, we must 
recursively traverse its tail field in order to close the region parameter r; in the remaining list. 
This process continues until we hit nil in the tail field since there are no more fields to recurse 
into. 

Now, we show a real example involving recursive datatypes that shows the usefulness of the 
close construct in building functional objects from the corresponding mutable ones. We present 
an efficient implementation of the map_list function that does not even require reversing the 
final list (c.f. function imp_map in Example 2.6) because the list is generated from left to right 
using a technique known as “open-lists” [ANP89]: 


°We abuse our notation slightly by calling locations embedded inside a constructor value as field values just 


like the other values present directly within the constructor, although bare locations are no longer considered to 
be proper values. They only serve to define the domain of the mutable store. 
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Example 4.15: 
def map_list f nil = nil 
| map_list f (x:xs) = 


close { 
hd = cons _ -; % The expression (cons _ _) allocates a (cons 1, L) 
hd.cons_l = f x; 
tl = hd; 


finaltl = { while not (nil? xs) do 
newtl = cons _ _; 
next X : next xs = xs; 
newtl.cons_l = f x; 
tl.cons_2 = newtl; 
next tl = newtl; 
finally tl }; 
finaltl.cons_2 = nil; % The expression nil allocates a (nil) 
in hd } :: (list _); % Abbreviation for (list(e,e€) _) 


Finally, observe that the set of locations that need to be examined for closing a given region 
variable in a general algebraic datatype depends solely on its type declaration. For instance, 
we know at the time of declaring the list datatype (Example 4.14) that the region variable 
r, occurs inside the type of its tail field. Therefore, we need to examine all the cons-cells on 
the spine of the list in order to close the region variable rj. But we do not have to examine 
the objects contained within the head fields in order to close the region ry. If r, occurred 
inside the type of the objects contained within the head fields, then the static semantics for the 
close operation described below would generate a static close-error and such a program would 
be rejected. Thus, an exact dynamic CLOSE rule can always be constructed for each region 
variable of a polymorphic, user-defined datatype at the time that datatype is declared without 
regard to how it is instantiated at various places within a program. 


4.3.4 Static Semantics 


The free variables of a general algebraic datatype are defined as follows: 
FL (p18) T1..5 Tk) = UiF (pi) JU; F (tj) Ur F (ae) 


The dangerous variables of a general algebraic datatype may either be dangerous within one 
of its argument types 7; or closure types mz, or they may occur within the type of a mutable 
field of one of its constructors. In the latter case, all the type variable parameters occurring 
within that field are inherently dangerous much like the type of an object contained within a 
mutable reference cell. Therefore, we define: 


Fir;) Ift; occurs inside a mutable field 
D(P(p1..4) Th... F..k) = UiF (pi) | JUrP (re) UU; a Otherwise 


Finally, the dangerous region variables of a general algebraic datatype are defined as follows: 


R(T (p1..8) Th.j Mk) = UiF (pi) UJ U;R(T;) UJ Up R(tk) 


The types of the primitive operators for allocation, dereference, and assignment of construc- 
tors and their fields are defined as expected. 
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The static CLOSE rule also follows the discussion of Section 4.1. We only need to show how 
to perform the verification for flagging a static close-error for algebraic datatypes. This is done 
as follows: 


1. Given an type-annotated expression, (close a) :: T(pi.4) T1..j M1..k, along with an 
inferred type T(p{_;) Tj T 4: first we match the regions p1,..; specified in the annotation 
against the corresponding regions p}_; of the inferred type. Null regions in the inferred 
type must exactly match the corresponding regions in the annotation type. While some 
region variables in the inferred type may be constrained to be closed (mapped to €), 
other region variables are simply renamed/unified to the region variable specified in the 
constraint. 


2. The candidate region variables so determined to be closed, say {r,...7,}, must not occur 
inside a function closure type within the inferred type parameters Tj or within the 
inferred closure parameters 7; ,. This ensures that we do not close region variables that 
are captured inside function closure types. 


3. Finally, the region variables being closed must satisfy the following test with respect to 
the annotation type: 


vr fryer} or ¢ [F(E)|JUyF (rj) JUnF(an)| 


If any of the above tests fails, we flag a static close-error. Otherwise, the close operation is 
considered to be successful. 


4.3.5 Soundness 


The static and dynamic CLOSE rules for general algebraic datatypes described above are direct 
extensions of the formal machinery shown for reference cells in Chapter 3. It is reasonably 
straightforward to see that we follow the same idea of specifying a fixed set of static regions 
to be closed for an identifiable set of dynamic locations. Therefore, all the semantic machinery 
given in Chapter 3 extends naturally to this framework. 


4.4 Functional Encapsulation in Conventional Languages 


We mentioned in Section 1.3 that the functional encapsulation mechanism presented in this 
thesis would also be quite useful in a monomorphic, first-order language such as C, Pascal, or 
Fortran. However, adding this mechanism to a conventional language may require a few changes 
in the language and its type system, a possible change in the programming style, as well as 
possible simplifications within the proposed type system itself. In this section, we outline how 
all this might be achieved using C as an example. 

It is clear that in order to make any kind of guarantees based on the type system, we 
must have a strongly-typed language. C is not strongly-typed because it allows unrestricted 
type conversion among object at the discretion of the user via type-casting [KR88]. Using 
this facility the user may convert pointers to closable objects into non-pointer datatypes and 
vice-versa, thereby completely throwing off our type analysis. Therefore, no type-casting may 
be allowed in order to ensure sound, verifiable functional encapsulation. 

The type system of C would obviously need to be extended with regions, although with 
suitably chosen syntactic defaults regions may not appear explicitly in many cases. For in- 
stance, the compiler may automatically assign region parameters to all struct and union type 
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declarations as discussed in Section 4.3.2. The compiler would also need to define a unique 
memory allocation function for each declared datatype. This is necessary because, as discussed 
above, we have to eliminate the use of type-casts which is most often used to fix the type of a 
freshly allocated object using the only available memory allocation function malloc. 

The most important simplification in our type mechanism would be that we would no longer 
need closure types. Although, C allows passing function pointers as arguments and results, 
functions are only declared at the top-level and they may only have free identifiers that are also 
declared at the top-level. Therefore, the types of such free identifiers would always be visible 
within the global type environment and can never be closed accidentally. In other words, we 
do not need to keep track of the types of the free identifiers of a function because such types 
would always be present in its enclosing type environment anyway.° This greatly simplifies our 
typing machinery and makes its even more intuitive and easy to use. 

Finally, we must point out that functional encapsulation is useful only if we localize the 
allocation and construction of objects to nested program blocks. This facility encourages a 
programming style where we dynamically allocate and update an object in a deeply nested 
block, and then close and return that object into the enclosing block where it may be used 
functionally. This style is certainly possible in C and Pascal but may preclude some earlier 
versions of Fortran due to the lack of block-structure and dynamic memory allocation. 


4.5 Conclusions 


4.5.1 Summary of Part I 


In the preceding chapters we have presented a powerful type system that fulfills our goal for 
providing a sound and verifiable type abstraction mechanism between the high-level functional 
layer and the low-level imperative layer of a polymorphic programming language. We started 
with the problem of implementing functional array constructs present in our high-level language 
in terms of low-level imperative program fragments written in a small kernel language without 
sacrificing storage efficiency or parallelism. In the process, we introduced a new construct 
within the kernel language called “close” that changes the view of a mutable data-structure 
from imperative to a functional one. The type system statically verifies the soundness of 
such a change and guarantees that successfully closed objects are never updated again during 
execution. 

We also showed how to extend the use of the close construct to complex data-structures 
within the language including arrays, tuples, functions, and general algebraic datatypes. We 
discussed issues of language design and specification of closing such data-structures and its 
effect on other language features such as type polymorphism and dynamic memory synchro- 
nization protocols. Our proposal for syntactically specifying closable objects blends nicely with 
already existing mechanisms of specifying type declarations and type annotations for program 
expressions. 

The type abstraction mechanism described in this thesis helps both compiler and language 
designers as well as the end-users. On the one hand, it helps to reduce the size of the compiler 
by permitting efficient implementations of high-level, functional constructs (e€.g., make_vector 
in Example 4.5 and map_list in Example 4.15) to be pushed into system libraries rather than 
being implemented within the compiler as primitives. On the other hand, it provides a tool 


°This is also true in Pascal and Fortran even though Pascal allows internal function declarations [JW75]. This 
is because in all these languages functions are never passed outside the scope of their definitions. 
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for the end-user to design arbitrary new functional data-structures more efficiently using im- 
perative kernel constructs and then safely close them (e.g., histogram in Example 2.16 and 
polar2rect in Example 4.4). In this sense, our type system provides a safe and controlled ab- 
straction mechanism for the end-user to exploit the power and efficiency of low-level, imperative 
constructs without destroying the clean semantics of high-level constructs. 


4.5.2 Implementation Status 


The type system described in this thesis is currently unimplemented. Therefore, our claims of 
displacing wired-in implementation of functional data constructors within the Id compiler in 
favor of system libraries, and user-level flexibility in implementing new functional abstractions 
are yet to be tested. Currently, the Id compiler uses several internal “hacks” to provide these 
functional abstractions which would clearly be unsound if exposed to the user directly.” Our 
typing machinery would have the effect of cleaning and legitimizing these hacks into proper 
kernel language features. Our type system would also combine three different type declara- 
tions used for M-structure, I-structure, and functional data objects into a single declaration as 
discussed in Section 4.2.4. 

Currently, the Id language is undergoing major revisions and in its next incarnation as pH 
[NAH93] we hope to include some of the ideas embodied in this thesis. 


4.5.3. Future Work 


As mentioned above, the obvious first task for us is to implement this type system fully and 
study its usefulness not only in terms of the semantic cleanliness but also its implementation 
efficiency and ease of use. We would like to implement this system both for Id (and pH) as 
well as a restricted subset of the C language as outlined in Section 4.4. Below, we discuss some 
alternate directions for future research. 


Theoretical _ Improvements 


There are several aspects of the current research that need more detailed scrutiny. Throughout 
in this thesis, we have used a strict, sequential dynamic semantics for our kernel language. 
We were able to do this because the problem of closing imperative data-structures is largely 
orthogonal to the issues of parallelism and synchronization which would have only made the 
formalization of the soundness proofs much harder. But it would be useful to show the sound- 
ness proofs directly in a parallel setting. This would also allow us to directly model the different 
closing strategies required with different memory synchronization protocols as discussed in Sec- 
tion 2.3.5. We feel that a graph rewriting framework such as [AA93] would be more appropriate 
for this purpose than the relational semantics approach taken here. 


Applications to Other Compiler Analyses 


This type system may also be used to infer useful static information that is conventionally 
determined using dataflow analysis or abstract interpretation. For example, we know that the 
static verification strategy for the close construct provides a limited form of object escape 
analysis. It guarantees that there are no additional references to the object being closed other 


"The current version of the Id compiler uses typeconverter declarations that simply change the type of an 


object without any semantic verification. It also uses internal pragmas to “fix” the functional polymorphism of 
array and list comprehension desugaring. 
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than the reference being returned from the close expression. This implies that the enclosing 
program fragment that receives the closed object has exclusive access to that object. If we do 
not make the object read-only upon closing, then this type mechanism effectively provides a 
static way for verifying exclusive dynamic access to a mutable object without using any syn- 
chronization primitives (such as semaphores) or single-threading the object through the entire 
program. The enclosing program fragment could make exclusive, unsynchronized read/write 
accesses to the object for some time then pass out multiple references to other sub-programs. 
All such references may again be brought together and again checked for escape in an enclosing 
scope. 

Another important observation is the dynamic life-time of an object that is shown to be 
closable at the boundary of a close expression and is actually not returned from that expression, 
is guaranteed to be bound to the scope of that close expression. This is because no references 
to that object may escape this scope. This information may be used to allocate such objects 
on stack instead of the heap as shown in [TT93], or insert additional code at compile-time to 
reclaim that storage automatically on the lines of [HJ92]. 
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Part II 


Types in Run-time System Design: 
Type Reconstruction 
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Chapter 5 


A Typed Run-time System 


5.1 Introduction 


Traditionally, programming environments of dynamically-typed languages such as Lisp or Small- 
talk maintain type information in the form of run-time type descriptors on every object. This 
information may be used, for instance, to detect run-time type-errors, to dispatch to different 
handlers for a given operation based on the type of the arguments, and to distinguish pointer 
data from non-pointer data for the purpose of garbage collection. Although very flexible in 
design, such language implementations pay the price of managing type-tags either in the form 
of complex specialized hardware or in the form of extra space and time requirements in software. 

Languages geared towards high performance computation such as C or Fortran take the 
other extreme. They aim for a very simple and efficient run-time system with no type informa- 
tion to be maintained at run-time. The user is made directly responsible for complex tasks that 
may require run-time type information such as ensuring type consistency and automatic storage 
management. If necessary, the compilers for these languages can be explicitly instructed to gen- 
erate static type information to be used for specific run-time applications such as source-level 
debugging. 

Several important questions arise at this point. What is the advantage of having type infor- 
mation available at run-time? What specific applications may use run-time type information? 
How much type information is desired, complete source-level types or a partial specification? 
What language design features may help or complicate the task of making run-time type in- 
formation available? How much of this type information can be pre-computed by the compiler 
and how? Do we need to carry the type information throughout execution or can it be recon- 
structed on demand? What is the run-time cost of such type maintenance or reconstruction? 
And finally, how does a typed language and its run-time system compare in terms of overall 
performance, program reliability, and user flexibility to other systems? 

In Part II of this thesis, we attempt to answer some of the above questions in the context of 
the Id programming language and its run-time environment. We study how source-level type 
information can be propagated through the compiler and made available during the execution 
of a program. We also discuss specific applications that use this information at run-time. 


5.2 Design Issues for a Typed Run-time System 


Several language design features affect the availability and the accuracy of type information 
during the execution of a program. Likewise, run-time system design decisions affect the overall 
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cost of computing and propagating this type information. Figure 5.1 shows several such design 
issues and classifies some existing programming languages on their basis. We discuss these 
issues below. 


5.2.1 Strong vs. Weak Typing 


Strongly-typed languages such as Pascal, Lisp, or Standard ML provide a consistent model of 
assigning a type to every data object and every sub-computation in a program. Computations 
are allowed to proceed only if provided with objects of the right type. Enforcing type consistency 
allows run-time type information to serve as a reliable description of the computation being 
performed at any time. Therefore, it makes sense to use this information, if available, for 
applications that operate on a wide variety of run-time data and need some mechanism to 
identify and distinguish among them. Applications such as displaying objects in a source 
debugger, marking objects in a garbage collector, and object I/O fall into this category. 

Weakly-typed languages such as C or Fortran permit the user to arbitrarily coerce the type 
of an object to another type. This makes the currently assigned type of an object to be a poor 
description of its actual contents. It is still possible to view an object according to its currently 
assigned type, but there is no guarantee that it provides the complete and accurate description 
of the object. Therefore, providing reliable type information at run-time is possible only in a 
strongly-typed system. 


5.2.2 Static vs. Dynamic Typing 


Compilers for statically-typed languages such as Pascal, or Standard ML enforce the type 
consistency expected from a strongly-typed program at compile-time. This frees up the system 
from the responsibility of checking for type consistency at run-time. Some modern languages 
like Haskell also provide systematic mechanisms to resolve overloading of operators and selection 
of methods at compile-time based on the static types of their arguments [WB89]. Therefore, 
static typing offers many of the advantages of dynamic availability of type information without 
actually carrying that information at run-time. Moreover, all the static type information may 
be saved and used in optimizations during the compilation phase itself or in other run-time 


106 


applications during program execution. Although, additional work may be needed to reproduce 
the desired information at run-time when demanded. 


5.2.3 Tagged vs. Untagged Object Model 


A simple way to provide type information at run-time is to tag every object: a few bits (usually 
one or two) in every word may be used as a tag to distinguish scalar objects from pointers to 
heap objects. More information about the type and size of objects may be kept in an object 
header. All dynamically-typed languages such as Lisp and Smalltalk use extensive tagging of 
objects in order to perform type consistency checks at run-time. Some implementations of 
statically-typed languages such as the Standard ML of New Jersey [App90] also make use of 
object tagging, usually for the benefit of the garbage collector. 

Tagging every object is costly. Keeping tag bits in every word reduces the range of repre- 
sentable scalars and pointers in conventional architectures, and the user application also pays 
the additional cost of tag maintenance. Sometimes, scalar values (usually floating point num- 
bers) may be boxed in a heap data-structure in order to preserve their full range. This incurs 
the additional cost of allocating the box and accessing it indirectly. 

Keeping objects untagged simplifies the memory model and eliminates the space and time 
overheads, but no type information is directly available at run-time. In weakly-typed languages 
such as C or Fortran, the user is directly held responsible for generating and propagating 
consistent type information at run-time. In statically-typed languages such as Pascal or Id, the 
compiler and the run-time system may share the responsibility for carrying the type information. 
The compiler may generate detailed symbol tables for each function in the program. The run- 
time system may load and process the information before program execution or upon request 
from another application. 


5.2.4 Type Maintenance vs. Type Reconstruction 


Recently, several type reconstruction schemes have been proposed for statically-typed poly- 
morphic languages that do not incur the run-time tag management overhead [App89, Gol91, 
GG92]. In these schemes, static type information may be combined with clues from the dy- 
namic state of the machine (the call stack) to automatically reconstruct the run-time type of 
most run-time objects. Therefore, with a small cost of type reconstruction, the type-tags on 
such objects may be safely dropped without compromising the ability to determine their exact 
run-time types. 

If the semantics of a language necessitates a tagged or boxed representation for objects, or 
if special hardware support for tags is available, then run-time type reconstruction is probably 
not the right choice. For example, compiler-directed type reconstruction is impossible in a 
dynamically-typed language such as Lisp because the language does not enforce sufficient static 
type restrictions on user programs in order for a compiler to gather all the necessary type 
information for later reconstruction. Maintaining tags on every object is the only way to ensure 
dynamic type consistency. Similarly, in the implementation of lazy languages such as Haskell 
[PJ92], all objects are boxed into closures to ensure lazy evaluation semantics. These closures 
can easily identify themselves and the object they contain via their code pointers. Independent 
type reconstruction does not provide any advantage in this situation. 

However, for the class of statically-typed languages that follow applicative-order evaluation,! 


'By applicative-order evaluation, we mean languages that evaluate function arguments before or in parallel 
with the invocation of the function. 
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type reconstruction enables substantial representational savings without sacrificing any run- 
time information. The object representations can be made clean and simple just like in C 
and Fortran, without compromising type consistency or the ability to use type information at 
run-time. Of course, we need to ensure that complete type reconstruction is possible for all 
run-time objects under all circumstances. However, the existing schemes [App89, Gol91, GG92] 
do not guarantee complete type reconstruction for all run-time objects under all circumstances. 
In particular, polymorphism and higher-order functions pose significant problems as discussed 
below. 


5.2.5 Polymorphism and Higher-order Functions 


Language features such as polymorphism and higher-order functions significantly complicate 
the problem of making exact type information available in a run-time system with untagged 
objects. Polymorphic functions are designed to be reusable with various types of data objects, 
therefore no clue about the type of an object may be associated with the definition of such 
a function. The exact run-time type of a particular application of a polymorphic function is 
usually an instantiation of its static type and must be derived from the use of the function at 
that application site. The run-time system needs to compute such instantiations upon a type 
reconstruction request. 

Similarly, higher-order functions take function closures as arguments and produce closures 
as results. These function closures may encapsulate hidden objects that are bound to the free 
identifiers of the function. Unfortunately, even an exact instantiation of the type of a function 
closure may not reflect the types of the objects captured within its environment. Therefore, the 
types of objects hidden within higher-order function closures may be impossible to reconstruct. 
We will examine some of these problems and their possible solutions in Chapter 6. 


5.2.6 Type Inference vs. Type Declaration 


Type inference is a convenient mechanism that frees the user from the task of declaring every 
identifier in the program with an appropriate type. Most modern programming languages such 
as Standard ML, Haskell, and Id use a systematic type inference system [Mil78]. Even languages 
favoring type declaration such as Pascal and C perform some ad hoctype inference in order to 
support automatic type coercions. 

Type reconstruction may be thought of as run-time type inference on the dynamic state of 
the computation, although, a large amount of that information is pre-computed statically within 
the compiler. The use of type reconstruction at run-time is orthogonal to whether the compiler 
uses type inference or type declarations in order to collect the necessary static type information. 
Providing the type information within the program in the form of type declarations does not 
reduce the complexity of making that information available at run-time. The compiler still 
has the task of saving all the necessary information in the appropriate form and making sure 
that complete type reconstruction is possible for all objects at run-time due to the problems 
discussed above. 


5.3 Our Approach 


Id is a strongly and statically-typed language. Furthermore, it supports a polymorphic type 
inference system and uses an untagged run-time system. Our goal is to use run-time type 
reconstruction in order to determine the exact type of all objects within the Id run-time system. 
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As mentioned earlier, the existing schemes [App89, Gol91, GG92] are unable to reconstruct the 
types of some objects. We would like to fix this situation so that the exact type of all run-time 
objects may be reconstructed automatically. 

Our proposed scheme lies somewhere in-between the two extremes of complete run-time 
tagging of objects (@ la Lisp, Standard ML) and carrying no type information at all (a la C) 
without compromising the goal of complete run-time type reconstructibility. We do not tag 
every run-time object, although a small amount of explicit type information may have to be 
carried within some higher-order, polymorphic functions in order to allow complete run-time 
type reconstruction. We analyze the user program at compile-time to detect such cases and 
insert the additional type information automatically. Essentially, our scheme can be viewed 
as compiler-directed explicit tagging for such run-time objects. We also provide a type re- 
construction algorithm and prove its correctness. The success of our scheme depends on the 
fact that the explicit type information needs to be inserted in very few cases that essentially 
plug the informational holes in the previous schemes and that it can be set up by the compiler 
automatically with little run-time support and overhead. 

The main contribution of this work is that we guarantee complete type reconstruction. As 
we will see in Chapter 7, our current system slightly restricts the acceptable set of type-correct 
programs in order to provide this guarantee. On the other hand, this guarantee opens the way 
for a universal framework for supporting various language and system applications that need to 
use exact object type information at run-time. We discuss some of these applications below. 


5.4 Applications of Complete Run-time Type Reconstruction 


5.4.1 Polymorphic Source Debugging 


A Source debugger for a statically-typed, polymorphic language is an ideal application for run- 
time type reconstruction. In a debugger, it may be necessary to display the values of any or 
all of the variables associated with a given procedure activation. Without any help from the 
run-time system, the static type signatures of polymorphic objects are usually insufficient to 
traverse and display their full contents. For example, the append function on lists has the 
polymorphic static type Vio. (list to) + (list to) — (list to). The function may be used in 
various contexts to append various kinds of lists. In each case, we need to reconstruct the full 
run-time type of its arguments in order to display their contents appropriately to the user. 

Another interesting property of source debugging is that type reconstruction is required only 
for those objects (or function activation frames) that are requested by the user for displaying. 
The entire state of the machine need not be reconstructed at once. Moreover, debugging does 
not impose any serious performance constraints for type reconstruction. Users are generally 
willing to tolerate a reasonable cost for displaying an object which would now also include the 
cost. of reconstructing its type. 


5.4.2 Tagless Garbage Collection 


Type reconstruction may also be used within a run-time system in order to perform garbage 
collection without maintaining any type information on the heap objects themselves. Ab- 
stractly, a garbage collector performs two functions: it distinguishes live objects from those 
that are garbage (live-object detection), and it reclaims the storage allocated to objects that 
are garbage (dead-object reclamation). For live-object detection, the garbage collector must be 
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able to distinguish scalar objects from heap-allocated objects and determine their sizes (object 
identification). The actual type of an object is very useful for this purpose. 

Conventional techniques for object identification operate with a very simple memory model 
and make little or no use of language and compiler-specific information. Pointers may be tagged 
using one bit to distinguish them from scalars values and objects may be provided with header 
tags or may be allocated in separate areas of memory to keep track of their size. The reader is 
referred to a recent such techniques in [Wil92]. 

Unlike source debugging, garbage collection does not require complete source type infor- 
mation per say, but additional type information may be helpful in optimizing the marking of 
live objects. For instance, it may be possible to entirely skip the traversal of large arrays while 
searching for embedded pointers to heap objects, if the exact run-time type of their elements 
turns out to be a scalar. Clever compilers and run-time systems that tag every object [App90] 
may sometimes be able to encode such information within the header of the array if its type 
is statically known to be a scalar, but this is not possible with polymorphic array construc- 
tors such as the make_vector function of Example 2.1 which could be used in both scalar and 
structured array computations. 

An alternative solution for object identification is to use complete run-time type reconstruc- 
tion. This technique enables garbage collection to be performed in an untagged run-time system, 
saving valuable application time and space spent in continuous tag maintenance. Complete type 
information also paves the way to type-based optimizations in marking flat data-structures as 
discussed above. But, one has to weigh these advantages against the cost of performing type 
reconstruction whenever garbage collection is requested. 

As an example, a simple “mark-and-sweep” tagless garbage collector would work as follows. 
When garbage collection is initiated, the first step would be to reconstruct the types of the 
root set of heap objects that are either stored in global variables or pointed at from within the 
function activation frames. The reconstructed type information would then be used to guide 
the garbage collector in identifying and traversing the reachable heap objects and marking them 
as live. Finally, unmarked objects would be reclaimed as garbage. We describe such a scheme 
in Chapter 8. 


5.4.3. Object-based I/O 


Another application that may benefit from run-time type information is I/O. Most program- 
ming languages offer either stream-based or continuation-passing I/O primitives for a few basic 
datatypes that may used to build more complex read/write functions explicitly (e.g., C, Pascal, 
Haskell). Typically, I/O formats and styles for complex objects are directly controlled by the 
user. Polymorphic objects are handled using explicitly parameterized I/O routines. With run- 
time availability of type information, I/O handling for complex (even polymorphic) objects can 
be made automatic. The structure of an object may be directly determined from its type. For 
fixed sized objects, the size of the object may also be ascertained from its type. For dynamically 
sized arrays, the size information may be kept within the object itself. Given this information, 
an entire complex object may be read or written easily using its type to select and guide the 
output format. 

The run-time systems of dynamically-typed polymorphic languages such as Lisp or Smalltalk 
usually offer such I/O capability automatically for each user-defined data-structure within the 
program. This is possible because all objects in such languages carry type-tags which may be 
used to guide the generic I/O functions according to the structure of that object. With type 
reconstruction, this capability may also be provided in a statically-typed languages with an 
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untagged object model. Moreover, just like tagless garbage collection, it may also be possible 
to generate object-based I/O routines that are specialized to a given object type and hence are 
more efficient than generic I/O routines that interpret the reconstructed type at run-time. 

Another possible use of complete run-time type reconstruction and object-based I/O is 
in periodic check-pointing of the entire machine state for long-running programs. Complete 
type reconstruction would enable traversal and recording of all the dynamic data-structures 
participating in the computation including the activation stack, the global environment, and 
all the accessible objects residing on the heap. 


5.5 Outline 


In the rest of Part I] we study the problem of complete run-time type reconstruction for Id 
programs in detail and describe some of its applications implemented within the Id run-time 
system. In Chapter 6, we intuitively analyze the problem of polymorphic type reconstruction 
by means of examples, describe the compiler and run-time system support required and outline 
a reconstruction algorithm. Chapter 7 formalizes these ideas in the context of the Kernel Id 
intermediate language, presents a complete reconstruction algorithm, and proves its correct- 
ness. Finally, in Chapter 8 we present tagless garbage collection as an application of complete 
type reconstruction and compare its performance with a conservative garbage collector and a 
compiler-directed explicit allocation /deallocation scheme. 
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Chapter 6 


Compiler-directed Polymorphic 
Type Reconstruction 


In this chapter, we informally present the problem of complete run-time type reconstruction for 
higher-order, polymorphic languages such as Id and discuss some of its solutions. In Section 6.1, 
we briefly describe the problem via examples and discuss why the existing approaches are 
insufficient to guarantee complete run-time type reconstruction. In Section 6.2, we provide the 
basic framework for doing complete type reconstruction, characterizing the analysis required at 
compile-time and the reconstruction strategy to be followed at run-time. Next, in Section 6.3 
we present a compilation scheme that identifies and inserts the necessary type information 
within the user program to guarantee complete type reconstruction at run-time. Subsequently, 
Section 6.4 walks through a reconstruction example. In Section 6.5, we show a series of compiler 
optimizations and variations on our compilation scheme that may further reduce the book- 
keeping overhead of the current scheme. Finally, in Section 6.6 we point to two implementations 
of our type reconstruction strategy. 


6.1 Type Reconstruction Problem 


The problem of type reconstruction for Id can be described as follows. At some point during the 
execution of a program, we wish to take a snapshot of the state of the machine and determine 
the type of every object accessible within the computation. We assume that the program is 
typed statically and that the run-time environment does not maintain any type information 
implicitly. In particular, Id run-time objects do not carry any type-tags. 

Clearly, only polymorphic objects and functions pose some challenge; complete type infor- 
mation can be obtained at compile-time for monomorphic objects. Also note that the exact 
nature of the desired information depends on the application that uses it. For example, a 
source debugger may wish to inspect any particular object from the current run-time state of 
the machine whereas a garbage collector only needs to traverse those that are still in use. Also, 
most garbage collectors only need to differentiate between scalars and pointers to structures 
while a source debugger needs exact type information in order to display the object properly. 
In general, we would like to devise a flexible strategy that can be optimized according to the 
level of information desired. 
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6.1.1. Basic Type Reconstruction Scheme 


Usually, the compile-time type of an object is a good starting point for the reconstruction 
of its run-time type. In case of polymorphic functions, the types of the objects contained 
within the function body would depend on the types of the arguments that it receives at a 
given application site. Appel [App89] first noted that if the exact types of the arguments of 
a polymorphic function were known at run-time, then its entire body could be instantiated 
appropriately using its compile-time typing. The exact types of the arguments present at an 
application site may, in turn, be determined by reconstructing the type of the parent’s body 
containing that application site and so on. 

Goldberg [Gol91] made the above ideas more concrete in the context of tagless garbage 
collection for strongly-typed, sequential languages. Although his scheme applied specialized 
garbage collection routines to heap objects directly without explicitly reconstructing their types, 
the basic mechanism of type reconstruction remains the same and may be described as follows 
in the context of parallel program execution: 


Compile-time support: 
1. The program is type-checked completely. 


2. For each user-defined function within the program, the types of all its arguments and 
the types of its local and free variables are recorded in a type-map. This type-map 
serves as a static template for the function’s run-time activation frame. 


3. For each function application site, the full static type instantiation of the function 
being applied is also recorded within the type-map of the enclosing function defini- 
tion. 


Program invocation and execution: 


1. The top-level expression is type-checked and the types of its command-line arguments 
are recorded. 


2. The top-level expression is now executed, expanding the run-time state of the ma- 
chine into a tree of activation frames (a stack of activation frames in a sequential 
language). Each function application evaluates in the context of its own activation 
frame which stores its actual arguments and saves the values of temporary local 
computations. 


3. The machine may be halted at any point during execution and type reconstruc- 
tion may be requested for a particular frame present within the current dynamic 
activation tree (the activation stack in a sequential language). 


Run-tume type reconstruction: 


1. First, the function corresponding to the current activation frame is identified and its 
static type-map is obtained. 


2. If the current function is not polymorphic then no type reconstruction is required. 
Otherwise, its parent activation frame and application site are identified using the 
return address information in the current frame. 


3. If the parent activation frame is the root of the dynamic activation tree then the 
exact types of the arguments supplied to the current function are already known. 
Otherwise, the process of type reconstruction is repeated for the parent frame by 
going back to Step 1. 
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closure 


(list int) 


Figure 6.1: The Run-time State of Computation in Example 6.1. 


4. Given the exact types of the arguments of a function, its static type-map is fully 
instantiated by matching the actual types of the arguments to their static types. 
This reconstructs the current activation frame and also provides the exact types of 
the arguments present at any application sites within the body of that function. 


As shown above, the reconstruction process may continue possibly up to the root of the 
activation tree where the run-time types of the user-supplied arguments are available. At that 
point, all polymorphic functions in the call chain can be correctly instantiated revealing the 
run-time types of their internal objects. In the context of sequential execution, Goldberg [Gol91] 
also showed that the entire state of the machine may be reconstructed in one pass by starting 
from the root frame at the bottom of the activation stack and working towards the most recent 
frame at the top of the activation stack. 

We illustrate the above reconstruction scheme with a small example:! 


Example 6.1: 
def enlist x, = x:nil; 
def map f nil = nil 
| map f Cy:ys) (str) = Cf ye): Gmap f ys); 
map enlist (1:2:nil) (jist int; 


‘All the examples in this chapter use the Id language syntax [Nik91]. Briefly, functions are introduced with 
a def keyword and allow pattern-matching on their arguments. (:) is the infix cons operation. 
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The function enlist has a static type Vto.to > (list to) and map has a static type Viit2.(t1 > 
tz) + (list t;) + (list tz). We also show the type instances of some internal identifiers as 
subscripts. The evaluation of the top-level expression (map enlist (1:2:nil)) dynamically 
unfolds into a tree of activation frames as shown in Figure 6.1. 

If we wish to examine the x argument of enlist during one of these calls, then the run-time 
instantiation of its static type ft) may be determined by following up the dynamic chain of 
activation frames into its application site within the map function. Here, tg may be related to 
the static type t, of the actual argument y at that application site. This relates to the type of 
the second argument (list t;) of map which is found to be (list int) at the root application site. 
Then, both t, and tp can be instantiated to int giving the actual type of x as desired. 


6.1.2 Problems with Closures and Free Variables 


Unfortunately, the above scheme is incomplete. Goldberg and Gloger [GG92] noted that some- 
times types of objects hidden inside a closure are impossible to reconstruct. Consider the 
following example: 


Example 6.2: 
def £2 x, yi, = Y3 
g2 = if ... then f2 1;,; else £2 "£00" strings 


g2 2; 


Here, £2 has a type Vigt).t9  t; > t1, and therefore g2 gets bound to a partially applied 
function closure with type Vig.tg 4 t2 that says nothing about the type of the data hidden inside 
it. In fact, this type cannot be determined at compile-time because it depends on the value of 
the predicate (...). Besides, during the evaluation of (g2 2) the return address information on 
the call stack would point to the application site of g2, which does not help in determining the 
contents of that closure either. Thus, we cannot reconstruct the type of the argument x within 
the activation of £2 because the computation that created its closure is no longer available as 
part of the dynamic activation tree. 

It may appear that this problem arises only when an argument of a function is never used 
within its body, but the following example adapted from [GG92] shows that this is not the 


case:* 


Example 6.3: 
def £3 X(Tist to) = 
{ def h3 z, = if length xjisti) == 1 
then z:nil 
else z:z:nil; 
in h3 }; 
g3 = if... 
then £3 (1:nil) (ist int) 
else £3 (true:nil) (iistdoo1); 


g3 2ints 


Here, the type of the function £3 is Véoty.(list to) + t, — (list t1), and therefore the type of 
the computed closure g3 is Vtg.t2 > (list t2). During the evaluation of the application (g3 2), 


?In Id syntax, a block-expression (bounded by {}) encloses a set of identifier bindings. The result of such a 
block is the value of the expression following the keyword in evaluated within the scope of the bindings. 
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no information is available in the activation tree whether this closure contains a list of booleans 
or a list of integers. Goldberg and Gloger argue in [GG92] that since h3 does not use the 
elements of its free variable list x but only its spine (to compute its length), a garbage collector 
can ignore these elements and copy just the spine. But this approach creates problems if these 
structures were shared in many places and is quite unsatisfactory for a source debugger that 
needs to display the full object. 

The problem of not being able to reconstruct the exact type of an object as shown above 
does not appear all the time. For instance, the type of argument z within h3 in the above 
example may be reconstructed to the type int by traversing up the call stack to its application 
site (g3 2). In fact, functions like map in Example 6.1 never have this problem: 


Example 6.4: 
g4 = (map 
enlist) (jist t9)—s(list (listto)) 3 
g4 (41:2:nil) (ist ints 


Even though here map is partially applied to enlist to yield a closure g4 with type 
Vto.(list to) + (list (list to)), we have not lost any type information. Instantiation of to to 
int at the call site of g4 yields complete type information about all the internal identifiers of 
both map and enlist. The problem with Examples 6.2 and 6.3 is that sometimes the types 
of closures do not have any connection with the types of objects hidden inside them. In such 
cases, we are in danger of losing type reconstruction information because the closure creation 
site may no longer be available on the call stack. 

Another interesting point is that polymorphic objects with universally quantified types do 
not pose this problem. The run-time type of such an object cannot be more specific than its 
compile-time definition type. For instance, in the following example the variable x within the 
body of £5 has the universally quantified type Vto.(list to). 


Example 6.5: 
def f5 y = 
{ x = nil; 
def hS z, = if length X(ij5:4,) == 1 
then z:nil 
else z:z:nil; 
in h5 }; 


Now, there is no question about the contents of the closure formed by h5 over its free 
variable x. It can never contain an object whose type is more specific than Vto.(list to). For 
our purposes, this means that the compile-time type of a polymorphic object provides sufficient 
information for its run-time type reconstruction. 


6.1.3. Discussion 


The examples presented above attempt to provide an intuitive understanding of the process of 
type reconstruction. It appears that for some polymorphic functions we are able to infer type 
reconstruction information from the parent-child relationships embedded in the activation tree 
while for others we need additional information at run-time for complete type reconstruction. 
Now we can characterize the problem of type reconstruction more concretely: 


1. First, we need to identify and record all the compile-time type information necessary for 
type reconstruction. We also need a criterion to identify what additional type information, 
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if any, needs to be carried at run-time for complete type reconstruction of polymorphic 
functions (Section 6.2). 


2. Next, we need a compilation scheme that transforms the given program into one that 
generates and propagates the additional type information (Section 6.3). 


3. Finally, we need a type reconstruction algorithm that uses the explicit and implicit type 
information at run-time and reconstructs the exact type of all run-time objects (Sec- 
tion 6.4). 


6.2 Type Reconstruction Framework 


In this section, we discuss the general framework for run-time type reconstruction. First, 
we describe the run-time execution model of Id programs. Using this model, we formulate 
a strategy for reconstructing the complete run-time machine state. Finally, we identify the 
essential information that needs to be recorded at compile-time and establish a type conservation 
criterion that guarantees complete run-time type reconstruction. 


6.2.1 Run-time Model of Program Execution 


Id is a non-strict, implicitly parallel language with an eager evaluation strategy. Below, we 
summarize the execution model of a Kernel Id program. 

A program in Kernel Id consists of an expression query to be evaluated within the scope of 
a set of top-level value bindings and type declarations. Typically, this evaluation is carried out 
in several phases as described below: 


Compile-time — First, the top-level bindings and type declarations are type-checked giving 
rise to the global static environment. This environment records the exact types of all 
global identifiers. Subsequently, all top-level value bindings, datatype constructors, and 
internal function definitions are compiled into independent code-blocks. 


Link/Load-time — All code-blocks are loaded and linked into the program memory giving 
rise to the global dynamic environment. 


Invocation-time — The top-level expression query is type checked in the global static en- 
vironment and then compiled into a root code-block. At this point, exact types for all 
local and free identifiers used in the query expression are known. The global static and 
dynamic environments together with the typed root code-block for the query expression 
constitute the complete initial state of the machine. 


Run-time — A code-block always executes in the context of an activation frame which 
records the actual arguments bound to its formal parameters, the run-time objects bound 
to its free identifiers, and the values of all its local identifiers during execution. An 
activation frame is allocated at the time of a function application and it is deallocated 
when that function terminates. In a sequential system, an activation frame corresponds 
to the stack frame of the currently executing function. In the parallel execution model of 
Id, the run-time stack generalizes to a tree of activation frames as shown in Figure 6.2. 


The program starts execution by allocating an activation frame for the root code-block 
recording its actual arguments and local identifiers. Subsequent function invocations 
extend the dynamic activation tree with their own activation frames, executing in parallel 
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Tree of Activation Frames Global Heap of Shared Objects 
(Spread across Computation Nodes) (Spread across Memory Nodes) 


(root) £: 


active threads 


Figure 6.2: The Parallel Execution Model for Id. 


with their parent activation. Shared objects are allocated on a separate global heap 
and are accessible via pointers from the activation frame (see Figure 6.2). Thus, at any 
time during execution, the complete run-time state of the machine consists of the global 
dynamic environment, the tree of active or suspended activation frames, and all the heap 
objects accessible through the global identifiers or the activation frames. This is the state 
of the machine we are interested in reconstructing. 


6.2.2. Type Reconstructibility 


Starting from the initial state of the machine as described above, we can view type recon- 
structibility as an invariant condition to be maintained at each subsequent evaluation step 
that modifies the run-time state of the machine. We identify two kinds of state modifications: 
intra-procedural, and inter-procedural. 

The intra-procedural modifications to the state of the machine are due to the computation 
within a code-block: accessing values of function parameters and free identifiers to compute 
local values, allocating heap objects, modifying global or heap objects etc. Since our language 
has a sound type system, type-correct programs are guaranteed not to produce run-time type- 
errors or to compute values that are type-inconsistent. This implies that any value bound to 
an identifier in a given code-block must be consistent with the exact type of that identifier, 
otherwise it could lead to a run-time type-error. This is true even for identifiers bound to 
mutable objects. In other words, the actual values of mutable identifiers and heap objects could 
change due to side-effects, but the types of those values would remain the same. Therefore, 
once the exact types of all identifiers present within a code-block are determined, they serve to 
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identify the exact types of all the values computed and the heap objects allocated within the 
code-block over its entire life-time. 

The inter-procedural modifications to the state of the machine take place at a function 
application or return. A function application introduces a new activation frame that binds a 
new set of local identifiers and points to the heap objects allocated within the function. We need 
to ensure that the exact types of these new local identifiers and heap objects are reconstructible 
on the basis of the existing state of the machine before the function application. 

The above discussion suggests that an activation frame is an appropriate unit of type recon- 
struction. The entire state of the machine may be reconstructed by induction on the structure 
of the dynamic activation tree. As the base step, the exact types of all objects in the root 
activation frame are already known at the start of the program. The inductive step is to ensure 
that at every function application site that expands the dynamic activation tree, the type of 
every slot in its activation frame can be identified and correctly instantiated. Below, we analyze 
the compile-time information required to achieve this. 


6.2.3 Recording Compile-time Type Information 


In Section 6.1.1, we informally introduced the concept of the type-map of a function that was 
used as a static template during its type reconstruction. Below, we make that definition more 
concrete: 


Definition 6.1 (Type-map) Given a function f = \11---2,.F with free identifiers {z,--- zy} = 


F(At1+++t,.E) and locally bound identifiers {y ---y} = B(E), its type-map denoted by TM 
records the following information: 


1. The function type, f +7, +++ Ty > Tn41- 

The types of all the function parameters 1 :T,,...,2n1%Tn- 

The type-schemes of all the free identifiers of the function, 21 1 02,,.++,%m 1 Fzm+ 

The type-schemes of all the locally bound identifiers yy : Oy, ,.--, Yl i Ty. 

The type-instance of the function identifier g at all application sites (g a, +--+ az) within 
the function body E. We also record whether an application site has been statically deter- 
mined to be a full-arity application site. 


SN go & 


A type-map records the static types of all the parameters, the free identifiers, and the local 
identifiers of a code-block along with some additional type information about its internal call 
sites. It is essentially a mapping from the frame slots of a code-block’s activation frame to 
their static types. The type-emap 7M; is parameterized by the set of all its free type-variables 
F(TM ;). This set exactly captures the missing information in the static type environment of 
a function that needs to be instantiated at run-time. 

We generate static type-maps for all code-blocks within the program at compile-time. These 
templates are then linked together with the compiled object code and may be accessed at 
run-time using the name of the code-block. As an example, Figure 6.3 shows the Kernel Id 
translation and the type-map for the map function of Example 6.1. 


6.2.4 The Principle of Type Conservation 


Consider a first-order application site for a function that does not have any free identifiers. We 
can reconstruct the types of all objects in its activation frame using the basic type reconstruction 
scheme described in Section 6.1.1. We assume that the name of the callee function can be 
identified from its current activation frame which also identifies its static type-map. The return 
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Typemap 


def map f 1 = (tO -> t1) -> (list tO) -> (list t1) 


{ p = nil? 1; 


11 = if p then nil (tO -> t1) Arguments 
else (list tO) 
{ x = hdl; 
xs = tl 1; bool Local Frame Slots 
y=f x; (list t1) 
ys = map f xs; tO 
12 =y: ys; (list tO) 
er (list t1) 


(list t1) 


(tO -> t1) Internal Call Sites 
(tO -> t1) -> (list tO) -> (list t1) 


Figure 6.3: Kernel Id definition and the Type-map of map function. 


address information stored within the frame identifies the caller’s activation frame and the exact 
application site within the caller’s body that gave rise to the call. Assuming that the caller’s 
frame has been reconstructed recursively, the exact type instantiation of the callee function 
recorded within the type-map of the caller (Item 5 in Definition 6.1) provides the exact types 
of all the arguments passed to the callee at this application site. Now, the callee function’s 
type-map may be instantiated by matching the types of the actual arguments with the types 
of the parameters recorded in the callee’s type-map. 

Unfortunately, not all application sites are first-order, since our language allows higher-order 
functions and partial applications (currying). As shown in Figure 6.4, partial applications create 
function closures that simply record the supplied argument in a closure data-structure instead 
of creating a new activation frame right away. The type of such closures may not provide 
sufficient information regarding the type of the arguments captured within the closure (e.g., 
closure g2 of Example 6.2). Some functions refer to free identifiers that must also be recorded 
in a closure at the point of their definition® (e.g., function h3 of Example 6.3). The types of 
such free identifiers may not be reflected in the overall type of the function closure either. 

In a higher-order language such as Id, function closures are first-class objects, t.e., they may 
be stored into heap data-structures, passed as arguments to other functions, and returned as 
values from the function that created them. Therefore, the function definition site or the partial 
application site that creates a closure is not guaranteed to be accessible when that closure is 
used in further computation. As shown in Figure 6.4, such application sites are termed as 


*Lambda-lifting transformation [Joh85] may be used to lift nested functions with free identifiers into top-level 
super-combinators that refer to only top-level identifiers. But, this transformation restricts the type polymor- 
phism of free identifiers and does nothing to change a higher-order program into a first-order program. ‘Therefore, 
we choose to deal with the problem of free identifiers directly. 
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Figure 6.4: Visible and Invisible Application Sites. 


invisible. A function closure expands into an activation frame only when all its arguments have 
been accumulated. This final application site of the closure is termed as the visible application 
site because its position may be determined by examining the return address stored within the 
expanded activation frame. 

The type reconstruction scheme outlined above for first-order function applications would 
work with higher-order function closures only if the closure type instantiation recorded at the 
final application site has sufficient type information to instantiate the types of all the free 
identifiers and previous arguments accumulated within the closure. Such a function is called 
type-conserving. This is a static property of a function’s type signature and is characterized in 
the definition given below. On the other hand, if a function does not satisfy the above property, 
then some type information may be lost at its definition site or its invisible partial application 
sites. We also identify such information in the following definition for each of the invisible 
application sites: 


Definition 6.2 (Type Conservation) Given a function f with arity k, type-map TM ;, and 
type-scheme Vary... Qy,.T) + Th > Tht; 


1. The type-variables F (TM 5)\F (1% 4 -++Tk 4 Th41) are defined as not being conserved 
at the function definition site. 
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2. The type-variables F(7;) \ F(tiga 2 +++ Tk G Teg) (1 <i < k) are defined as not being 
conserved at its i-th application site. 


3. The type-variables F(t, + Tk41) are defined as being conserved at the final (k-th) 
application site. 


4. The function f is said to be type-conserving if all the type-variables in its type-map are 
being conserved, t.e., F( TM ¢) = F(t > Tr41)- 


Informally, a type-conserving function can correctly instantiate its entire type-map with just 
the run-time type of its final application closure. It is easy to check that map and enlist from 
Example 6.1 are type-conserving, while £2 from Example 6.2, and £3 and h3 from Example 6.3 
are not, which is why we were losing type information in those cases. 

Definition 6.2 may be used by a compiler to detect functions that are not type-conserving. 
Furthermore, the definition shows exactly how much type information is lost at each application 
site. The next question is what type reconstruction strategy should be devised for such func- 
tions? Our scheme is to make every function closure self-sufficient, which means that a closure 
for a non-type-conserving function must carry exact run-time encodings of its non-conserved 
types. We describe our compilation scheme in the next section. 


6.3. Compiler Support for Type Reconstruction 


In this section, we informally describe a compilation scheme that analyzes every function in 
the program and transforms it to generate and propagate exact run-time type instantiations of 
its non-conserved type-variables where necessary. These encoded type-hints are inserted at the 
partial application sites that otherwise do not preserve this information and are deposited into 
the function’s activation frame at the time of its final application. These type-hints may then 
be used to reconstruct the exact type instantiations of the non-conserved type-variables for the 
current activation frame of the function. 

It is interesting to note that the propagation of type information from closure creation 
sites to their final application sites for non-type-conserving functions may be formulated as 
an overloading resolution problem which may then be handled using well-known techniques in 
the literature [Gup90, PJ W92, WB89]. These techniques systematically translate overloading 
into parametric polymorphism by replacing unresolved instances of overloaded identifiers in a 
function with additional parameters that are supplied at its application site. In our scheme, 
these parameters are the explicit type-hints that are used by the type reconstruction algorithm. 

Below, we intuitively describe our compilation strategy by means of examples. We also 
provide a simplified but self-contained description of overloading resolution and translation 
mechanism as applied to type reconstruction. The full details of this transformation and the 
subsequent reconstruction process appear in Chapter 7. 


6.3.1 Detecting Violations of Type Conservation 


The first step in our compilation process is to identify the functions in the program that 
may require additional type-hints for the non-conserved type-variables in their type-map. This 
is straightforward given the test for type conservation in Definition 6.2. First, we type-check 
each function f in the program and generate its type-map 7’My according to Definition 6.1. 
Then, using Definition 6.2 we determine which type-variables in its type-map, if any, are not 
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Typemap 


Type Parameters 


def h3 z = ; ; 
{ yO = length x; t? -> (list t1) Fn. Signature 
y2 = if yl then 
{ y3 = z:nil; x (list tO) Free Identifiers 
se oa length| V12. (list t2) -> int 
t A = z:nil; yo int Local Frame Slots 
yS = z:y4; yl bool 
Pl y2 (list t1) 
aan y3__| (list tt) 
y4 (list t1) 
y5 (list t1) 


Internal Call Sites 


(list 10) -> int 


Figure 6.5: The Kernel Id definition and type-map of function h3 from Example 6.3. 


being conserved. For example, the type-map for function h3 from Example 6.3 is shown in 
Figure 6.5. Its type signature is Vt,.t, > (list t,). Comparing these two together we get, 


F(T Mp3) = {to, t1} 
FOU. As 


Therefore, the type-variable tp is not being conserved in the function h3 and it requires a 
run-time type-hint for proper type reconstruction. 


6.3.2 Propagating Non-Conserved Type Information across Functions 


In general, additional type-hints may need to be propagated within the body of a function not 
only to reconstruct its own non-conserved type-variables but to pass them on to other functions 
within its body that require those type-hints. Also, some of the non-conserved type-variables 
at these internal application sites may get partially or completely instantiated. We need to 
record these instantiations so that appropriate type-hints may be generated at those sites. 

Both the above problems may be addressed by viewing the reconstruction of the non- 
conserved type-variables as an overloaded operation trec? that must be resolved within the body 
of the given function. Standard overloading resolution mechanism picks up such unresolved 
overloaded identifiers and arranges the required information to be passed in as a parameter to 
the function. Subsequent uses of the function ensure that the additional information can be 
instantiated from the enclosing environment, thereby propagating the requirement outwards, if 
necessary. We illustrate this process for the function h3 of Example 6.3: 


Example 6.6: 
def £3 (tree? to) X(List to) = 
{ def h3(trecet)) Zt, = if length x(y5:1,) == 1 
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then z:nil 
else z:z:nil; 
in h3 (tree? to) }s 
g3 = if... 
then £3 (tree? int) (1:nil) (list int) 
else £3 (tree? bool) (true :nil) (iis d001) 3 
3 2ints 


Here, we have added a predicate? (trec? to) as an annotation on the function h3. In general, 
a predicate is added to a function’s type signature for every non-conserved type-variable in 
its type-map at the precise argument position where that information is being lost according 
to Definition 6.2. Subsequently, the standard overloading resolution mechanism automatically 
propagates this predicate to the place where h3 is referenced and to the enclosing lexical function 
£3 because it remains uninstantiated (and hence unresolved) in its body. Finally, this predicate 
propagates to the application sites of £3 where it is completely instantiated according to the 
types of the arguments being supplied to £3 and is considered to be resolved. 

Intuitively, the propagation of a predicate associated with a function represents a lack of 
type information locally which must be supplied from the application site where this predicate is 
instantiated. Note that the predicate need not provide the full type of the argument or the free 
identifier of the function that requires such information (e.g., the identifier x in Example 6.6). 
It only identifies the instantiations of the non-conserved type-variables present in that type. 
This is sufficient to fully instantiate the type stored in the function’s type-map corresponding 
to that identifier. This scheme allows us to share the type instantiations of the non-conserved 
type-variables across several identifier types that contain that type-variable. Thus, the number 
of external type instantiations needed by a function is limited by the number of non-conserved 
type-variables and not by the number of its actual parameters or free identifiers present in its 
type-map. 

Another interesting observation is that predicate instantiations involving polymorphic type- 
variables are always considered as resolved and are not propagated outwards in the light of the 
discussion in Section 6.1.2. For instance, g3 in the above example might have been defined as: 


Example 6.7: 

g3 = if... 
then £3 (ree? (iistt)) (nil:nil) 
e1se £3 (tree? bool) (true:nil) ; 


Here, (trec? (list t)) is an instantiation of £3’s predicate according to its polymorphic 
argument (nil:nil). Even though this predicate has an uninstantiated type-variable ¢, it is 
not propagated any further because it is polymorphic at this point. It follows immediately that 
there can be no unresolved predicates at the top-level because there are no free type-variables 
in the top-level type environment by construction. 


6.3.3. Program Translation 


The final step in our compilation process is to add extra hint parameters to the function 
definitions that have non-conserved type-variable predicates of the form (trec? t). Likewise, 


“We follow the terminology of [Gup90, WB89] where the usual Hindley/Milner type of a function is extended 
with predicates to model overloaded identifiers. In Haskell [HWe90] these are known as contests. The predicate 
name trec? in our scheme stands for type-reconstructible ?. 
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a predicate (trec? T) appearing at a function application site is transformed into a type-hint 
encoding 7 that is passed as an explicit argument at that application site. 

It is possible to either add one hint parameter for each non-conserved type-variable or 
group the hints together in a single hint-record from which the individual hints may be fetched. 
Our current scheme adds one hint parameter per type-variable at the position specified by 
Definition 6.2. This is because passing a small number of additional parameters is currently 
cheaper in our system than allocating and fetching from heap data-structures. 

The compiler keeps a record of the mapping between the non-conserved type-variables of 
each function and its additional hint parameters. This mapping, also called the hint-map, is 
shown below: 


Definition 6.3 (Hint-map) Given a function f = Ax,---t,.E with non-conserved type- 
variables p = {a1,...,&m}, its hint-map is the mapping HM ¢ = {(a1 4 y1),---; (@m > Ym) }, 
where Y1,---,Ym are its new additional hint parameters. 


As an example, below we show the hint-map for the function h3 from Example 6.6: 


pe hint 1 


The actual type-hints may now be generated using an encoding of the type constructors 
and their type arguments. The encoding should permit type-hint construction and propagation 
from within the user program. Although not necessary, we may view the encoding as an Id 
datatype as shown below: 


type typehint = none | tc string (list type_hint) ; 


The disjunct none encodes polymorphic type-variables that do not require any hint. The 
disjunct te encodes a type-constructor by its name and a list of encoded type-parameters. The 
free type-variables of a type-hint 7 are encoded using the corresponding additional parameters 
of the enclosing function definition recorded in its hint-map. 

Continuing with Example 6.6 above, the following translation is obtained: 


Example 6.8: 
def £3 f3_hint_1 x = 
{ def h3 h3_hint_1 z = if length x == 1 
then z:nil 
else z:z:nil; 
in h3 £3_hint_1 }; 
g3 = if... 
then £3 (tc "int" nil) (1:nil) 
else £3 (tc "bool" nil) (true:nil); 
g3 2; 


Notice how the hints generated within g3 propagate into h3 via the hint parameters of £3 
and h3. The appropriate hint will now be available in a dynamic activation of h3 where it may 
be used along with its type-map to reconstruct the exact run-time type of x. 


6.4 Run-time Type Reconstruction 


Now, we have all the necessary information to reconstruct the entire run-time state of the 
machine. As discussed earlier in Section 6.2.1, the global dynamic environment and the tree of 
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Activation Tree Heap 


h3_hint_1 


Pe 


f3 hint.1 : 


"bool" 


TN 


Type Reconstruction: 
t? = int 
tO = Decode|[[ h3_hint_1 ]] = bool 


Figure 6.6: The Run-time State of Computation in Example 6.8. 


activation frames constitute the root set of the run-time state of the machine. All the relevant 
heap objects may be accessed through this root set. The types of the global identifiers are 
already available in the global static environment. Therefore, we only need to reconstruct the 
types of all the activation frames in order to obtain the types of all the objects in the root 
set. The type of any accessible heap object may then be reconstructed by examining the fully 
instantiated type of an appropriate pointer within the root set that leads to the given heap 
object. 

The detailed algorithm for complete type reconstruction of an activation frame will be 
presented in Chapter 7. Here, we describe a type reconstruction example to illustrate the 
modifications to the basic scheme presented in Section 6.1.1. These modifications use the type- 
hints inserted by the compilation scheme of Section 6.3 to account for the type information 
that is otherwise lost. 


6.4.1 A Type Reconstruction Example 


Figure 6.6 shows a snapshot of the state of the machine during the execution of translated 
Example 6.8. Let us suppose that the predicate (...) in the definition of g3 evaluates to false 
at run-time. The computation of g3 expands into an activation frame for £3, returning a 
closure for the function h3 with the appropriate type-hint and the second argument hidden 
inside. We assume that this computation has terminated and the activation frame for £3 
has been deallocated (shown with dotted lines in Figure 6.6) so that there is no trace of the 
application site where g3 was constructed. The evaluation of the application (g3 2) unfolds 
the computation into an activation frame for h3 as shown in Figure 6.6. Let us also suppose 
that the program is halted when h3 has just been invoked. The problem is to reconstruct the 
types of the objects in h3. 

The type-map of the function h3 given in Figure 6.5 shows that it needs the exact type 
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instantiations of the type-variables to and t, for proper type reconstruction. From the hint- 
map given in Section 6.3.3, we know that the additional parameter h3_hint_1 encodes the exact 
type instantiation for the type-variable ty which is decoded to produce the type bool. The type 
of the free identifier x within h3 may now be reconstructed to be (list bool) as given by its 
type-map. The remaining type-variable ¢; is instantiated to the type int as described earlier 
in Section 6.1.1 by matching the application site type instance recorded in the root type-map 
with the full type signature of the h3. This completely instantiates h3’s type-map yielding the 
exact types of its function parameter z and its other local identifiers. 

As noted in Section 6.1.2, the type reconstruction schemes described earlier [App89, Gol91, 
GG92] would fail to reconstruct the type of x in the body of h3. The reason is that these schemes 
only use the type information derived from the current stack of activation frames. When higher- 
order closures such as g3 are invoked and type reconstructed, the function producing it, £3, 
may not be present on the current stack. Any clues that £3 might have provided regarding the 
types of free identifiers of g3 are therefore not accessible during reconstruction. 


6.5 Compiler Optimizations 


It might appear that our compilation scheme incurs a lot of run-time overhead due to additional 
parameters and encoding and decoding of types but our experience has been that realistic 
programs contain very few (if any) non-type-conserving functions, so the overhead of generating 
and propagating their type-hints is reasonably small. Although our current performance is 
adequate, we hope to be able to improve our scheme through several compiler optimizations 
that are discussed below. 


6.5.1 Rearranging the Hint Parameters 


Currently, additional type-hint parameters required by a function definition are placed just 
in front of the regular parameter that would otherwise lose that information according to 
Definition 6.2. This is not strictly necessary. We can place a hint parameter either before 
or after the first regular parameter whose type contains the non-conserved type-variable that 
is encoded by the hint parameter. This rearrangement does not affect program translation 
(Section 6.3.3) since the regular parameter and the associated type-hint parameter are still 
supplied together at the same application site. Of course, the hint parameters corresponding 
to the non-conserved type-variables in the types of the free identifiers of a function must still 
be place right up front. 

The benefit of such rearrangement is that it may sometimes reduce the propagation overhead 
of type-hints by removing some extra parameters altogether via y-reduction. For example, the 
following alternate translation for Example 6.6 is also valid (compare with Example 6.8): 


Example 6.9: 
def f3 x = 
{ def h3 h3_hint_1 z = if length x == 1 
then z:nil 
else z:z:nil; 
in h3 }; 
g3 = if... 
then £3 (1:nil) (te "int" nil) 
else £3 (true:nil) (tc "bool" nil); 
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Here, the parameter f3_hint_1 of £3 was pushed after its parameter x which made this 
y-reduction possible. 


6.5.2 Arity Analysis 


Definition 6.2 conservatively prescribes that the only type-variables that are conserved in a 
multiple-arity function are those present in its final application type because the function could 
be curried over its initial arguments. This definition can be specialized to include the types 
of all the arguments present at an application site, if that site is guaranteed to be accessible 
through the dynamic activation tree. That is, all arguments at an application site that leads 
to a full application may be treated as being conserved at that application site. For example: 


Example 6.10: 
def f 11 12 13 = (length 11)+(length 12)+(length 13); 
g=f (1:2:nil); 
...€g (true:nil) ("foo":nil))... 


Definition 6.2 predicts that the types of lists 11 and 12 are not conserved by the definition 
f. But at the final application site for closure g, 12 is also available immediately which implies 
that its type is conserved at this application site. 

In general, at compile-time, it may not be possible to recognize the application of an arbi- 
trary function closure as its final application site. But it is easy to recognize the special case of 
first-order (or full-arity) application of a function where all its arguments are supplied at once. 
In such cases, the types of all the actual arguments and the type-variables present in them may 
be instantiated from its application site, although the function may still require type-hints in 
order to reconstruct the types of its free identifiers. 

In our current scheme, it is not possible to optimize away the type-hints prescribed by 
Definition 6.2 for a function at its first-order application sites because the function definition 
may still require type-hint parameters due to higher-order application sites present elsewhere. 
This is simply a consequence of our choice to provide type-hints by adding extra parameters 
to a function’s definition. Alternatively, we can either generate a specialized first-order version 
of the function that does not carry any type-hints and use it wherever possible, or choose 
another mechanism for hint propagation that is transparent to the usual parameter passing 
conventions. Then, we would be able to tailor the type-hints according to the information 
available at a particular call site without affecting the function’s definition. 


6.5.3. Escape Analysis 


Together with first-order call site information, if the types of the free identifiers of a function 
are also known to be reconstructible via the currently visible activation tree, then no extra 
types-hints are necessary at all, even if the function was determined to be non-type-conserving 
by Definition 6.2. Escape analysis of function closures offers this information. Specifically, 
if analysis shows that a function closure does not escape from the lexical scope where it was 
defined, then the correct instantiations of its free identifiers would still be available from the 
activation frame of this ancestor in the activation tree. In that case, we do not need to set 
up extra type-hints to reconstruct these instantiations within the given function’s activation 
frame. 

It is possible to use the region-based closure typing system described in Part I of this thesis 
to undertake such escape analysis for internal function closures. We simply need to associate a 
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fresh region variable with each internal function definition that statically tracks the movement 
of its closure data-structure. Presence of this region variable in the type environment of the 
enclosing control block, or in the type of the returned value from that block would indicate that 
the function closure is escaping the scope of its definition. 


6.5.4 Tail Calls 


Our current scheme does not deal with tail calls where the usual caller—callee relationship is 
violated. A tail call removes the caller’s activation frame from the activation tree and connects 
the callee to the parent of the caller directly. In such a situation, the application site information 
for the callee is lost. Consider the following example: 


Example 6.11: 
def f x = 1 + length x; 
def gn = if n == 
then f (1:2:nil) 
else f (true:nil); 


GZiwee5 


Without tail calls, the type of x in an activation of f can be determined by locating its call 
site within the then or the else branch of the conditional inside g. But, if these applications 
were compiled as tail calls, then the f’s activation will get directly connected to the top-level 
and the call site information will be lost. 

It is easy to extend our scheme to deal with this situation. We simply modify Definition 6.2 
to reflect the fact that no call site information is available for f and therefore explicit type-hints 
may be needed for all of its free type-variables. This leads to the following translation: 


Example 6.12: 
def f fhint_1 x = 1+ length x; 
def gn = if n == 
then f (tc "int" nil) (1:2:nil) 
else f (tc "bool" nil) (true:nil); 


GZiwee5 


Now, all the type information is available from within the activation of f. Of course, this 
scheme is not optimal because it ignores the call site information even when it is available using 
regular calling conventions. In order to incorporate that flexibility, we need to generate several 
application site specific versions of the function definition as discussed earlier. 


6.5.5 Type Specialization 


Our current scheme generates and interprets encoded type information in order to reconstruct 
the types of all local and free identifiers of a function. We do not take any position on what 
to do with these types. This strategy is adequate and desirable for a source debugger because 
it may wish to manipulate an object in many different ways. Once the type of the object is 
reconstructed, it can be interpreted to traverse and manipulate the object in any desired way. 

It is possible to apply the principle of type conservation (Definition 6.2) and the program 
analysis and translation strategy (Section 6.3) in any specific context to allow complete analysis 
of run-time objects in that context. For instance, in order to display objects in the Id debugger, 
we could compile a parameterized display routine for every datatype occurring in the program. 
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Run-time type reconstruction would be used to compose these display routines appropriately 
and then display the given object directly by passing it to its display routine without any type 
interpretation. 

Similarly, it is possible to generate specialized garbage collection routine(s) for every func- 
tion instead of its type-map, parameterized by GC-routines that correspond to the free type- 
variables in its type-map. Then, we can generate and propagate closures of GC-routines instead 
of type-hints as described in Section 6.3. These parameter routines would be picked up auto- 
matically by the GC-routine(s) of the function from its activation frame at the time of garbage 
collection. This scheme would operate in the same way as the tagless garbage collection mecha- 
nism proposed by Goldberg [Gol91] where function-specific and site-specific garbage collection 
routines are generated that understand the structure and the liveness properties of the local 
identifiers of a function. Moreover, no additional hash-tables would be necessary in order to 
keep track of partially traversed polymorphic shared objects as shown in [GG92] because com- 
plete type reconstruction ensures that the entire traversal of a shared object can be completed 
the very first time it is encountered. 


6.6 Implementation Status 


The type reconstruction scheme described in this chapter has been implemented in two different 
applications within the Id programming environment. We briefly discuss these implementations 
below. 


6.6.1 Type Reconstruction in a Polymorphic Source Debugger 


The need to solve the problem of type reconstruction initially arose while attempting to display 
polymorphic object within a source-level debugger for Id. A preliminary version of the type 
reconstruction scheme described in this chapter was implemented during the fall of 1992 in 
the context of the Id source debugger [Car93] for the Monsoon dataflow architecture and was 
reported in [AC93]. 

The Id compiler [Tra86] was modified to perform the type analysis and hint generation for 
every function within the user program as shown in Section 6.3. A simple Id datatype encoding 
was used for type-hints as shown in Section 6.3.3. The compiler also generated the type-map 
and the hint-map for every function. In order to reduce the book-keeping within the debugger, 
the types of temporary, internal identifiers were dropped from the type-map of a function; only 
source-level, user-defined identifiers were kept together with their position in the function’s 
activation frame. 

The Id debugger [Car93] was written in Lisp and executed on the host processor in the 
front. It allowed a user to stop the Id program executing on the Monsoon processor in the back 
when certain pre-specified events were triggered. The user could then traverse the current tree 
of activation frames within the Monsoon memory and request function arguments and local 
identifiers to be displayed along with any heap objects that they pointed to. Objects within 
the Id run-time system did not carry any type-tags, therefore, complete type reconstruction 
was needed in order to decipher the run-time object structure. The debugger reconstructed the 
object types one frame at a time using the run-time type-hints and the type-map and the hint- 
map information provided by the compiler. These types were then interpreted to traverse and 
display the contents of the requested identifiers properly. Objects hidden inside higher-order 
function closures were not displayed, although such objects could be displayed once the closure 
was applied and gave rise to an activation frame. 
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The entire Id programming environment called “Id-World” containing an editor-based in- 
cremental Id compiler, a simulator for the Monsoon architecture, and the Id source debugger 
with complete polymorphic run-time type reconstruction was successfully demonstrated during 
the ACM Conference on Functional Programming Languages and Computer Architecture held 
in Copenhagen, Denmark, in June 1993. 

In [AC93], we presented a preliminary compilation and type reconstruction scheme which 
omitted some of the formal details. The complete compilation scheme and the type reconstruc- 
tion algorithm now appears in Chapter 7 along with a proof of its correctness. 


6.6.2 Type Reconstruction for Tagless Garbage Collection 


During the fall of 1993, full support for run-time type reconstruction was integrated into the 
Id compiler for the *T multi-threaded architecture and its run-time system [CCFT93] for the 
purpose of performing tagless garbage collection. Naturally, this required complete type recon- 
struction for every slot of every function activation frame and all the heap objects reachable 
from them including higher-order function closures. 

We conducted a feasibility study involving the design and implementation of a simple “mark- 
and-sweep” garbage collector for the *T architecture based on the run-time type reconstruction 
mechanism. We compared the performance of this scheme against a conservative garbage col- 
lector and a compiler-directed explicit allocation /deallocation scheme, all implemented within 
the same framework. The results of this study were first reported in [AFH94] and are presented 
here in Chapter 8. The study showed that tagless garbage collection based on type reconstruc- 
tion was not only feasible but also beneficial for scientific programs with large scalar arrays. 
The study also indicated that the type reconstruction cost was a small fraction of the overall 
garbage collection cost. Complete details appear in Chapter 8. 
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Chapter 7 


Formal Framework for Run-time 
Type Reconstruction 


In this chapter, we formalize the reconstruction strategy outlined in the last chapter. Section 7.1 
presents the complete grammar for our intermediate language Kernel Id. In Section 7.2, we 
describe a compilation scheme that analyzes the source program to identify the additional type 
information necessary for complete type reconstruction and then transforms the program to 
propagate this information at run-time. Section 7.3 presents the run-time type reconstruction 
algorithm and discusses its complexity. Finally, in Section 7.4 we show the correctness of our 
algorithm. 


7.1 The Kernel Id Intermediate Language 


Our description of type reconstruction is based on the Kernel Id intermediate language Kernel 
Id as shown in Figure 7.1. This language supports a rich set of datatypes including typi- 
cal scalar basetypes, general algebraic (sum-of-products) datatypes, n-dimensional arrays, and 
curried functions. Records and tuples are a special case of algebraic datatypes with a single 
product disjunct. We also assume a rich set of primitive functions for basetypes and array 
construction /selection/modification, as well as standard predefined algebraic datatypes such as 
list and bool. 

Kernel Id allows multi-arity function definitions and general algebraic type declarations. 
Every sub-expression in this language is given an explicit name that permits accurate repre- 
sentation of data-sharing. In particular, we assume that every A-expression has an identifier 
name associated with it, 7-e., A-expressions are only allowed to occur on the right hand side 
of a binding. Simply nested let-bindings are generalized to a recursive letrec-style block of 
bindings. Similarly, a 2-way conditional operator (if...then...else...) is generalized to an 
m-way Case dispatch operator. The semantics of this language has been given directly in terms 
of graph rewriting rules as shown in [AA91, AA94]. Although, we will use the operational 
machinery described in Chapter 3 while showing the correctness of our type reconstruction 
algorithm. 

Kernel Id is a more realistic abstraction of actual intermediate form used in the Id com- 
piler [AA91, Tra86] than the tiny expression language used in Chapter 3. The Id source lan- 
guage supports special syntactic constructs such as list and array comprehensions, complex 
pattern matching, and nested function and type declarations [Nik91]. During compilation, 
the Id source program is translated into a Kernel Id program using standard front-end analy- 
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EXPRESSIONS 


C € Constant 
f,U,Y,2. € Identifier 
SE € Simple Expression 
BE € Expression 
PF" € Primitive Fn. with n arguments 
Case” _T € m-way Case Dispatch for type T 
Chm € m-th Constructor Identifier with k,, arguments 
Constant = Integer | Float 
SE = Identifier | Constant 
E n= SE | PF" (SE,,...,SE,)|Case”T SE (E,--- En) 
| Atype+ tn. BE | SEY SE 2 | Block 
Block n= { [Binding;]* in SE } 
Binding = Identifier = F 
Declaration ::= Binding | Type-Decl 
Type-Decl ::= typeT a,---a, = Ch 11° + Tik, 
| Cm Tm1 °° *Tmkm 
Program = (Declaration; |* EF 


Figure 7.1: The Kernel Id Intermediate Language. 


ses and transformations such as comprehension-desugaring, scope-analysis, type-checking, and 
pattern-matching compilation [AA91, Gup90, Tra86]. These transformations result in a Kernel 
Id program where every sub-expression has a unique name and a well-defined Hindley/ Milner 
type, so that all internal type declarations can be lifted to the top-level. Although, we use 
source Id syntax in our examples, their correspondence to a Kernel Id program should be easy 
to follow. 


7.2 Compiler Support for Type Reconstruction 


7.2.1 A Type System for Computing Type-hints 


Figure 7.2 shows a systematic way of performing type-hint analysis and propagation discussed 
informally in Chapter 6 within the context of the Kernel Id intermediate language. We have 
modified the usual Hindley/Milner typing rules [Mil78] to compute and propagate additional 
type-hint information. In this system, the conventional Hindley/Milner type of a function 
closure (71 — 72) is prefixed with the set of type-variables p that are not conserved in its 
immediately previous partial application.! 

Definition 6.2 identifies the exact set of non-conserved type-variables at each argument 
position of a multi-arity, user-defined function. Type-conserving positions are assigned the 
empty set ¢. Each type-variable t € p may be taken to represent the overloading predicate 
(trec? t) as shown in Section 6.3.2. Type schemes o generalize and instantiate such augmented 


‘Although, p is defined here to be a set, the ordering of the type-variables within the set. would become 
important when we translate their type instantiations into actual type-hints parameters. 
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TYPES 


a,8 € Type-Variable 
T” € Type-Constructor with n type arguments 
T € Type 
p € Type-hint Set = PowEr-SEetT(Type) 
ol € Type Scheme 
TE  € ‘Type Environment = Identifier — Type Scheme 
T = a | int | float | (nd_array 7) 
| (2" tT1-++T) | T1272 | pT 
o n= Vay+++Qy.T 


typeof (c) > @.7 
TEE e:7,¢ 


CONST: 


typeof (PEF") > @.(t1 4 +++ Tn SF Tn41),¢ 
PRIMAPP: EY shi itinpi 1 stsn 
TE F PF” (SEy,...,SEn) : Trt, Ut <ien Pi 


TEL SE: (T t1+++T), po 
CASE: TE} E; 27, pi l<i<cm 
TE} Case™_T SE (E;+++ Em) > 7, Upesem Pi 


IDENT: TE(«) 2 p.t 
, TEFae:t,p 
App: TE- SE,: (1! p.7), pu TE SE 2:7’, po 
, TE- SE, SE 2:7, (pU pi U pa) 
TE + {21 71,.--,Un + Tra} F Et tai, p 
Let TM be the type-map of Avy, ---a@,).E 
ABS: po = F(TM)\ F(t > -++T > Tr41)) 
. pi = F(t) \ Fria 0+ Tn > Trt) l<i<n 
p= F(p)\ (po ++ U Pn-1) 
TEE Aty- ++ tn EF: po.(t1 2 pi-(T2 9 +++ Pn-1-(Tn F p-Tnoi) ++), @ 
TE + {2,06 7} Bp: 7, pi eb 
TEy, = TE + {x; ++ Gen(TE,7)} mse 
TE», + {ai Tbh Ey: 7, pi ie by 


BLOCK: TE), = TE, + {aj 4 Gen( TE, ,7i)} 


TE», - SE : To, po 
TEP {a1 = Fy3+++3 a, = Ey in SE} : 10, Uo<i<n Pi 


Figure 7.2: Rules for computing Non-Conserved Type Information for Kernel Id Programs. 
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types as usual. We derive typing judgments of the following form: 
TE-F E:T, p 


Here, TF is a type environment mapping identifiers to type schemes, 7 is the type assigned to 
the expression 7, and p is the set of type-hint instantiations within F that are needed during 
its type reconstruction. These type-hints are required when a non-type-conserving function 
is referenced or is applied inside the expression F. All such type instantiations are collected 
and propagated up to the nearest enclosing function definition where they become part of that 
function’s type-hint requirements. 

Looking at Figure 7.2, predefined constants and primitive functions (CONST and PRIMAPP 
rule) do not give rise to any non-conserved type-variables since they always execute within 
the current activation frame and never create any partial applications. The CASE rule is 
also straightforward. It simply collects the type-hint instantiations inductively from its sub- 
expressions while ensuring that all branches have the same type. 

The IDENT rule instantiates the augmented type of a user-defined function, exposing the 
exact instantiations of its non-conserved type-variables that need to be provided at that point. 
The augmented type instantiation is immediately split into the actual type 7 and the set of type- 
hint instantiations p. Note that the size of the set p remains fixed during its type instantiation. 
In particular, an empty set of type-variables ¢ can never be instantiated to a non-empty set of 
type instantiations and vice-versa.” 

New type instantiations may also be introduced by the APP rule, where the augmented type 
of the result closure exposes the exact instantiations of the non-conserved type-variables at 
that application site. All such instantiations are collected and propagated to be resolved at the 
nearest enclosing A-expression. 

The ABs rule computes the set of non-conserved type-variables of a \-expression and records 
them within its augmented type so that they may be instantiated later by the IDENT rule or 
the APP rule. The type-hint sets po---py_1 are computed for each argument position of the 
function as given by Definition 6.2. These sets are placed along the type signature of the 
function just after the argument position where that type information would otherwise be lost. 
Type-variables that are conserved at the various argument positions are excluded from the 
corresponding type-hint sets. The final type-hint set p’ computes the additional type-variables 
for which type-hints are required by internal sub-expressions of the A-body. 

The BLOCK rule is a generalization of the usual Hindley/Milner LET rule as applied to the 
more complex syntax of the Kernel Id language. The type generalization operation Gen(TE,T) 
generalizes the augmented type 7 (which may contain embedded type-hint sets) into a type 
scheme Va, ---a,.7. We assume that the bindings in a block, numbered 1...n, are partitioned 
into k groups of mutually recursive bindings 6; ---b, (b1 +---+ 6, = n), and these groups 
are topologically sorted such that definitions occur before their uses. Each group of mutually 
recursive bindings is type-checked within a type environment that assigns polymorphic type 
schemes to the identifiers bound in previous groups and monomorphic types to the identifiers 
bound within the same group. This transformation maximizes Hindley/Milner polymorphism 
for an unordered sequence of bindings [Gup90, HWe90]. 


?This property ensures that each type-variable instantiation may be treated as an independent parameter 


to be inserted at that site during translation, although it may introduce some subtle typing discrepancies as 
discussed in Section 7.2.4. 
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7.2.2 Type Inference 


The type system shown in Figure 7.2 may be directly used as a basis for automatically inferring 
augmented Hindley/Milner types along the lines of the standard Hindley /Milner type inference 
algorithm [Mil78]. The type-hint sets are considered to be ordered and of fixed size, and may 
be treated as part of the type signature of a function. In particular, note that a non-empty 
set can never be unified with an empty set. Therefore, the usual structural term unification 
algorithm [Rob65] would suffice for matching types.® 

The type inference algorithm would be similar to the infer algorithm cited in Section 3.4 with 
minor modifications. We need to do some book-keeping in order to collect and propagate type- 
hint instantiations from within expressions and process them at the enclosing function definition. 
The modified type inference algorithm would return the possibly augmented Hindley/Milner 
type (7) of an expression along with the set of type-hint instantiations (p) gathered from within 
the expression. Type generalization, instantiation, and substitution would now take place on 
augmented types. In case of a user-defined function, the algorithm would also compute its 
type-map as given by Definition 6.1 and the type-hint sets po---Pn—1, p’ as shown in the ABS 
rule. These sets would then be attached to their appropriate argument positions within the 
type signature of the function. 


7.2.3. Program Translation and Type-Hint Generation 


The final step in the compilation process is to add explicit parameters to functions with non- 
trivial augmented types and to provide appropriate type-hints at their application sites. 

Generation of type-hints uses a run-time encoding and decoding scheme as shown in Fig- 
ure 7.3. The encoding is performed under a Translation Environment , that maps free type- 
variables of a given type 7 to value-domain identifiers encoding those type-variables. The 
encoding scheme TEnc[] produces a Kernel Id expression which when executed at run-time 
generates the type-hint encoding for the given type scheme; it does not generate the encoded 
type scheme itself. This is so because the encoding scheme is used as part of the source-to- 
source compilation process that translates a Kernel Id program into another Kernel Id program 
with explicit type-hint propagation. 

For each type constructor 7”, we denote its run-time encoding by a new constant T”. A 
bound type-variable a; in a type scheme Va,---a,.7 is encoded as a special constant type- 
constructor To. A family of Kernel Id primitive functions pack” with arity n are used to pack 
an encoded type constructor and its arguments into a run-time data-structure. 

The decoding scheme TDec[] is used at run-time to convert the encoded type-hints into 
actual type schemes used during run-time type reconstruction. Although this mechanism is 
described as the logical inverse of encoding type schemes, the actual decoding format depends 
on the data format used within the run-time system for type reconstruction.4 

The program translation and hint generation scheme TExp|] is shown in Figure 7.4. This 
translation is guided by the typing judgments derived from the typing rules shown in Figure 7.2. 
The translation rules operate under a Translation Environment , that maps free non-conserved 
type-variables of a function definition to its type-hint parameters. 


“The careful reader might note that performing structural type matching on the type-hint sets may reject 
some programs that would be considered to be type-correct in the original Hindley/Milner type system without 
such sets. We will discuss this issue in Section 7.2.4. 

‘In our current implementation discussed in Chapter 8, the data format used for encoding type-hints is the 
same as that used within the run-time system for type reconstruction, therefore no decoding is necessary. 
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, € Translation Environment = Type-Variable > Identifier 
a € Encoded Type Scheme 
TEnc[] € Type Scheme —> Translation Environment > EXPRESSION 
TDec[] ©€ Encoded Type Scheme + Type Scheme 
TYPE SCHEME ENCODING 
TEne[a] = , (a) 
Let z,21,...,2, be new identifiers, 
TEnc[(T" 1---7,)) 5 = { «a =(TEnc[7], ); 
Zn, = (TEnc[7,] , ); 
z= pack”*!(T", 2,...,2n); 
in z } 
TEnc[Va,---a,.7] , = TEnce[r] (, + {air T° }) l<i<n 


TYPE SCHEME DECODING 


TDec[o] = Va,---a,.TDec'[a] 
where {a4,...,Q,} = F(TDec’[e]) 


TDec’[T°] =a 
TDee'[(T",7,.--;7)] = (T”" TDee’[F] --- TDec’[7,]) 


Figure 7.3: Encoding and Decoding of Type Schemes. 


Most of the translation rules are straightforward. Constants do not require any translation. 
The rules for primitive application, Case-expression, and block recursively translate their sub- 
expressions. 

The translation of a function identifier converts the exact instantiations of its non-conserved 
type-variables into explicit type-hint arguments using the encoding shown in Figure 7.3. Simi- 
larly, the translation of a function application inserts appropriate type-hints at that application 
site as directed by the function signature. 

The translation of a A-expression adds explicit hint parameters y, ---y, at the appropriate 
position corresponding to each non-conserved type-variable obtained from its typing judgment. 
We also record this mapping as the hint-map of the A-expression and use it to extend the 
translation environment for the body of the given A-expression. 

We assume that the type-map (Definition 6.1) of a A-expression is updated to reflect the 
new type-hint parameters that are added to its type signature and the new local bindings that 
are created within its body during the translation. This change does not affect the set of 
free type-variables of the type-map because encoded type-hints have a fixed, pre-defined non- 
polymorphic type, and the types for all other additional identifier bindings are already present 
within the type-map. 

After this program transformation, all the type information needed to fully instantiate the 
type-map of a function is available at run-time within its function activation frame, either 
directly as run-time type-hints or indirectly via instantiations of conserved type-variables in 
its type-map. In the next section, we will show a type reconstruction algorithm that uses this 
information at run-time to reconstruct the complete dynamic state of the machine. 
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, € Translation Environment = Type-Variable — Identifier 
TExp[] ¢ Expression > Translation Environment + EXPRESSION 


CONST: 

TExp[c] , = c¢ 

PRIMAPP: Let z, z1,...,2n be new identifiers, 

TExp[PI” SE,---SE,] , = { 2 =(TExp[SFi],);--- 
Zn = (TExp[SE,,] , ); 
z= PF" x +++ Zn} 
in z } 


CASE: Let z, 29 be new identifiers, 
TExp[Case”_T SE (fy ---Em)] , = { zo = (TExp[SF],); 
z=Case”_T x ( (TExp[SF,],) --- 
(TExp[ SE] , ) ); 


in z 
} 
IDENT: Given typing judgment TE + «:7,p where p = {7,---7}, 
Let z,21,...,2, be new identifiers, 
TExp[z] , = {a= (TEnc[7;] ’ ); vt 
Zn, = (TEnc[r,], ); 
ZS Lp} 
in z } 
APP: Given typing judgment TE + SE, SE :7,(pUpiU p2) where p= {7 ---Tp}, 
Let z,2',2",21,...,2, be new identifiers, 
TExp[SE; SEo]_ , = { 2 = (TExp[SFi], ); 


2" = (TExp[SE)] , ); 
z, = (TEnc[71] ,); --- 
Zn, = (TEnc[r,], ); 
eal a! x es ays 
in z } 
ABS: Given typing judgment TE Aty-++a,.F: po.(T1 9 +++ Pn-1-(Tr 2 P-Tn41) ++), ¢ 
where (pp U-+-U pn-1 Up’) = {01 -++ Om}, 
Let y1,---;Ym be new parameters with hint-map HM = {a1 > y,...,@m > Ym}, 
TExp[Ar1- ++ @-E] 5 = AY pg @1Y 1 Yon 1 EnY p's (TExpl[E] (, + HM)) 


BLOCK: Let z be a new identifier, 

TExp[{t = Fu;---;@, =F, in SE}] , = { a1 =(TExp[/i],); --- 
t, = (TExp[F,] , ); 
z= (TExp[SF], ); 


in z} 


Figure 7.4: Program Translation and Hint Generation Rules. 
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7.2.4 Discussion 
Type Mismatch in Curried Functions 


The augmented type system presented in Section 7.2.1 above is a straightforward modification 
of the standard Hindley/Milner type system, but it has one drawback that it restricts the 
set of type-correct programs to those that are also type-reconstructible. In particular, this 
system may reject some programs that would have been considered type-correct in the usual 
Hindley/Milner type system without any type-hint sets. The following example illustrates this 
point: 


Example 7.1: 


def fi xy = y; 4 £1 :: Vtoti.¢.(to > {to}.(t1 > ¢.t1)) 
def {2 x = %4 £2 :: Vtot.d.(to => @.(ty => @.t1)) 

{ def h2 y = y; 

in h2 }; 
gi = (if ... then f1 else f2); 4 Static Type Error! 
g2 = if ... then f1 1 (int) else f2 1; 4 No Static Type Error. 


The functions f1 and £2 have the same type signature in the usual Hindley/Milner type 
system but they have different type signatures in the current system because f1 requires a type- 
hint for its first argument, while £2 does not. This is because the type of the first argument 
of the function £1 is not conserved according to Definition 6.2, while both £2 and the internal 
function h2 are considered to be type-conserving. This type mismatch shows up in the binding 
for g1 which is flagged as a type-error in our augmented type system. However, the binding for 
g2 may be typed without any problem because the type-hint required by the function £1 has 
been already inserted. 

The above example shows that our type system makes a subtle distinction between implicitly 
curried multi-arity functions such as f1 and their explicitly curried counterparts such as £2. To 
be precise, this difference shows up only in non-type-conserving functions as shown in the above 
example; type-conserving functions would always have empty type-hint sets. This difference 
exposes an important run-time characteristic of such functions: the number of applications 
after which a function closure expands into an activation frame, which is controlled by their 
syntactic arity and not by their semantic typing. 

In a way, this difference is to be expected because f1 and f2 carry different objects within 
the closures resulting from their first application and hence require different amount of type 
reconstruction information. Since f1 is implicitly curried, it simply records its first argument 
within its closure. This forces the type conservation mechanism (Definition 6.2) to insert 
additional type information in order to ensure subsequent type reconstruction of this argument 
even if it was never used within the function’s body. On the other hand, £2 produces an entirely 
new closure h2 on its first application that is completely independent of its first argument. So, 
there is no need to preserve its type within the returned closure. 

It should be noted that the type mismatch between f1 and f2 is generated not merely 
because we have chosen to represent the type-hint information explicitly within the type sig- 
natures of these functions. This type mismatch is actually a consequence of the underlying 
compilation mechanism that treats additional type-hints just like any other function parame- 
ters. In particular, it would not be possible to correctly compile the binding for g1 even if the 
type-hint analysis was done after the type-checking phase. This is because under our current 
compilation strategy, only £1 requires a type-hint which is determined only after it is applied 
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to a particular argument. 


Multiple Type Signatures 


Another interesting difference between this type system and the usual Hindley/Milner type 
system shows up with higher-order functions that take other functions as arguments. Consider 
the map function shown earlier in Example 6.1 which is reproduced below: 


Example 7.2: 
def map f nil = nil % map :: Vtoty.¢.(to > 6.41) + ¢.((list to) + ¢.(list t,)) 
| map f (y:ys) = % map :: Vtoty.¢.(to > {to}.t1) 4 ¢.((list to) + {to}. (list t1)) 


(f y):(map f ys); 


def enlist x = x:nil; 
gi = map enlist (1:nil); %4 No type-hint needed. 


def ignore x y = y; 
g2 = map ignore (1:nil) (int); % Type-hint needed internally by ignore. 


Two possible types for the map function are shown. The first type assumes that the input 
function f is type-conserving and therefore would not need any type-hints when it is applied 
within the body of map. This permits type-conserving functions such as enlist to be passes 
to map as usual. The second type signature assumes that the incoming function would not be 
type-conserving and would need a type-hint at its application within the body of map. This 
type-hint propagates up to the definition of the map function and shows up in its type signature 
after the second argument. This allows non-type-conserving functions such as ignore to be 
passed as arguments to map. 

It may be a little disconcerting to note that the map function no longer has a single type. 
On the other hand, the two versions of map are truly different functions and must be compiled 
as such—one that propagates type-hints and the other that does not. One can think of the 
original Hindley/Milner type signature of the map function as being overloaded with the various 
intended versions. The compiler may selectively produce these specialized versions according 
to the type of the arguments supplied to map. 


Alternate Compilation Scheme 


Both the problems presented above may be fixed by making the type-hint compilation more 
uniform and transparent to the standard parameter passing mechanism. In this section, we 
briefly examine one such compilation scheme. 

Instead of inserting type-hints required at a given argument position as additional parame- 
ters, we may put them in a separate record and pass a single pointer to that record to a fixed 
entry point identified by that argument position. Effectively, this adds one additional parame- 
ter for every argument position whether or not any type-hints are needed at that position. The 
advantage of this compilation scheme is that it completely dissociates propagation of type-hints 
from regular parameter passing, although it takes additional frame space and time overhead in 
allocating type-hint records. In this scheme, Empty type-hint records need not be propagated 
at all, while non-empty type-hint records may be passed to even type-conserving functions that 
do not require this information. The latter property fixes the problem of compiling g1 shown 
in Example 7.1. Now, a type-hint record would be created for each application site of g1 which 
would be used by £1 during type reconstruction but would be simply ignored by £2. 
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This scheme also makes the compilation of higher-order functions such as the map function 
of Example 7.2 more uniform. Now the map function may be compiled to always propagate the 
type-hint record it receives from its first argument position to its internal application site. If 
no actual type-hint record is supplied from outside then this mechanism essentially propagates 
an empty type-hint record to the internal application site. However, specialized version of map 
that do not pay this overhead may still be compiled as an optimization. 


7.3. Run-time Type Reconstruction 


7.3.1 Type Reconstruction Requirements 


Before we describe our type reconstruction algorithm, we summarize the requirements for full 
type reconstruction as discussed in previous sections. We use both compile-time and run-time 
information. 


1. The compile-time information consists of the type-map (Definition 6.1), the hint-map 
(Definition 6.3) and the arity of each function that is stored in the symbol table entry for 
that function. 


2. Furthermore, every function in the program must be transformed as shown in Section 7.2 
to propagate explicit type-hints for its non-conserved type-variables. 


3. The run-time information consists of the global dynamic environment and the root frame 
of the activation tree that remain live and are assumed to be accessible throughout the 
computation (Section 6.2.1). The activation tree hangs from the root activation frame 
and is modified dynamically, as the program executes, by the procedure linkage code. 


4. At program invocation time, complete type information is available for the user query 
expression and the root activation frame (Section 6.2.1). Therefore, the root frame should 
already be marked as reconstructed. 


5. Given any activation frame, we should be able to identify the function associated with 
it, its parent activation frame, and the application site in the parent frame that created 
this activation frame.° Typically, the conventional return address information within the 
callee is sufficient for this purpose. 


6. Proper decoding mechanism should exist for types and type schemes encoded as type- 
hints (Figure 7.3). A run-time mechanism for type unification is also required, although 
it can be simplified considerably since static type-checking guarantees that unifications 
performed within the reconstruction algorithm cannot fail. 


7.3.2 The Reconstruction Algorithm 


Figure 7.5 shows the pseudo-code for the reconstruction algorithm RECONSTRUCT-FRAME 
which is invoked at run-time to reconstruct the types of all variables in a given activation 
frame. RECONSTRUCT-FRAME takes an activation frame as a parameter and returns a fully 
instantiated type-map for that frame. For ease of presentation, the algorithm makes use of 
several auxiliary functions which we will explain as we go along. 


’We ignore the issue of “tail calls’ whose compilation was discussed in Section 6.5. 
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23 
24 
25 


RECONSTRUCT-FRAME(activation-frame) 
> Return if already reconstructed. 
if FRAME-RECONSTRUCTED (activation-frame) 
thenreturn FRAME-TyPE-MaP(activation-frame) 
> Otherwise, start reconstruction. 
activation-fn < ACTIVATION-FN(activation-frame) 
> Copy the function’s type-map. 
type-map + Type-MAP(activation-fn) 
{a1,...,Qn} < F(type-map) 
Scopy — {ai + Bi} where B1,..., 8, are new. 
> Process the type-hints. 
hint-map <— HintT-MapP(activation-fn) 
Shint — { forall (a +> x) in hint-map 
o + Fercu-ARGUMENT(2, activation-frame) 
o + TDec[o] 
collect (Syo,ya ++ a) } 
> We are done if the type-map is fully instantiated. 
if F (ShintS copy (type-map)) = @ 
then FRAME-TyPE-MaP(activation-frame)  ShintS copy(type-map) 
> Otherwise, obtain call site information from the parent. 
else { parent-activation-frame + PARENT-ACTIVATION-FRAME(activation-frame) 
parent-type-map ~— RECONSTRUCT-FRAME(parent-activation-frame) 
Tuse < Use-Typk(activation-frame, parent-type-map) 
Tdep — DeEv-TyPe(activation-fn, S¢op,(type-map)) 
if FuLL-APP(activation-frame, parent-type-map) 
then S4ef-use <— UNIFY (Tdef; Tuse) 
else { k ~ ARITyY(activation-fn) 
Ti -°'Th — Tht — T def 
Sdef-use <— UNIFY(TR + Thi, Tuse) } 
FRAME-TyPE-MAP(activation-frame) © SdefuseShintS copy(type-map) } 
FRAME-RECONSTRUCTED (activation-frame) <— true 
return FRAME-TyPE-MAP(activation-frame) 


Figure 7.5: The Type Reconstruction Algorithm. 
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RECONSTRUCT-FRAME is divided into several sections. We begin at Line 1 by checking if 
the given activation frame has already been reconstructed. If so, the previously recorded frame 
type-map is returned immediately. Otherwise, we initiate the reconstruction process. 

The first section, Lines 3-6, initializes the data-structures used in the reconstruction. We 
extract the name of the current activation function from the given frame using the selector 
function ACTIVATION-FN and instantiate its type-map with fresh type-variables by building a 
type substitution S.o), for its free type-variables. This is necessary so that types from multiple 
activations of the same polymorphic function do not inadvertently interfere with each other. 

The next section, Lines 7-11, builds a type substitution $);,¢ for all the non-conserved 
type-variables of the function as prescribed by its hint-map. The type-hint corresponding to 
each hint parameter present in the hint-map is fetched from the activation frame and then 
decoded according to Figure 7.3. The resulting type schemes are the run-time instantiations of 
the non-conserved type-variables present in the hint-map. 

Following this, Line 12 checks to see if all free type-variables of the type-map have been 
instantiated to either ground or polymorphic types. If so, the reconstruction is complete and 
the fully instantiated type-map is recorded at Line 13. The FRAME-RECONSTRUCTED flag is 
set at Line 24 and the reconstructed type-map is returned at Line 25. 

If the test fails at Line 12, Lines 14—22 obtain the remaining information from the activation 
tree as follows. First, the type-map of the parent of the current activation is reconstructed by 
calling RECONSTRUCT-FRAME recursively with the parent’s activation frame. Using this type- 
map and the current activation, the auxiliary function UsE-TyYPE obtains the reconstructed 
type-instance of the call site responsible for invoking the current function (see item 5 of Defini- 
tion 6.1). This type-instance, Tyse, is then unified with the defined type of the current function, 
Taef that is available within the current type-map. This unification fully instantiates all the 
remaining type-variables in the current type-map which is recorded at Line 23 and is returned 
at Line 25 as before. 

The matching of Taef to Tuse is slightly complicated by the fact that the current activation 
could either be a full application of a k-arity function to all its arguments or it could simply 
be the final (&-th) application of a curried function closure that has already accumulated k — 1 
arguments in previous partial applications. The recorded application site type instance Ty5¢ 
would be different in these two cases and therefore it must be properly aligned before matching 
with the function’s full type signature T4.7.° This application information is also recorded within 
the parent’s type-map and is obtained at Line 18 using the auxiliary function FuLL-AppP. In 
case of a full application, 7,5. is directly unified with ra-, recorded in the current function’s 
type-map at Line 19. In case of a curried application, 7,;. must be unified with just the final 
application type T, — T.41 of the defined type Taf as shown at Line 22. 


7.3.3 Reconstruction Complexity 


A few observations about the reconstruction algorithm are worth pointing out. First, the entire 
activation frame of a function is reconstructed at once. This is possible because the types of all 
the objects present in an activation frame share the same set of free type-variables which are 
precisely captured and instantiated using its type-map. This obviates the need to traverse the 
activation tree multiple times in order to reconstruct the types of various identifiers belonging 
to the same frame. 

Second, we cache the reconstructed type-maps of all activation frames for future references 
by their child frames. Therefore, no activation frame may need to be reconstructed more than 


®Tn our earlier paper [AC93], this operation was abstracted into the auxiliary function UNIFY- ALIGNED. 


144 


once. Furthermore, since the root frame is already marked as reconstructed at the start of 
the program, the algorithm is guaranteed to terminate properly as it recursively climbs the 
activation tree at Line 15. 

Finally, the algorithm climbs the activation tree from the current activation frame only as far 
up as necessary. The climbing process terminates at the first ancestor frame that has already 
been reconstructed, or earlier if sufficient information is available via the type-hints. This 
avoids traversing the activation tree from the root activation frame to all its leaves as suggested 
in [Gol91] which would involve reconstructing all the activation frames within the dynamic 
activation tree. Our algorithm pays only incremental cost for each request for reconstruction, 
which is a very useful feature for interactive applications such as a source debugger. 

The cost of the algorithm RECONSTRUCT-FRAME shown in Figure 7.5 depends on the 
following factors: 


1. The number of ancestor frames reconstructed due to recursive calls to the algorithm 
RECONSTRUCT-FRAME at Line 15. 


2. The cost of decoding the type-hints at Lines 7-11. 
3. The cost of unification at Line 19 or Line 22. 


The maximum number of ancestor frames reconstructed in a given call to the algorithm 
RECONSTRUCT-FRAME is bounded by the number of frames occurring between the current 
activation frame and the root frame. In a sequential system, this is all the frames sitting on 
the stack. In a parallel system, this is the number of frames on any path from a leaf to the 
root in the dynamic activation tree which is only the depth of the dynamic activation tree and 
not its overall size. Of course, since all reconstructed type-maps are cached, the overall cost of 
reconstructing every frame within the dynamic activation tree is still linear in the total number 
of activation frames, assuming a unit cost for type unification and type-hint decoding. 

The cost of decoding the type-hints depends on the number of non-conserved type-variables 
in the type-map and the size of their run-time type instantiations. Similarly, the cost of unifi- 
cation is proportional to the size of the function’s instantiated type. Although it is possible to 
write functions whose Hindley/Milner types are exponentially large compared to the size of the 
function itself [Mai90], such cases are rare. Typically, functions possess small type signatures 
that can be efficiently manipulated using graphical representations. Non-conserving functions 
are rare as well and run-time type instantiations of non-conserving type-variables are also small. 

The interesting observation here is that the cost of reconstructing a type-map for a given 
activation frame does not depend on the number of slots in the activation frame or the total size 
of the type-map itself, but only on the size of the type signature of the corresponding activation 
function. This is because we never need to examine or copy the type of every identifier recorded 
in the type-map during its reconstruction. We only instantiate its free type-variables.” 


7.4 Correctness of the Type Reconstruction Algorithm 


In this section, we will show that the type reconstruction algorithm given in Figure 7.5 is correct, 
i.e., it infers the exact type for every object at any time during the execution of a program. 
We will define the notion of the exact type for an object shortly, but for the time being it 
may be viewed as the type that would have been attached to the object had we computed and 


7An independent program such as a debugger or a garbage collector may ultimately need to examine the 
reconstructed types of every element in the activation frame. That cost is not included in reconstruction. 


145 


propagated source type information all through the execution of the program. In dynamically- 
typed languages such as Lisp, this is exactly how dynamic type-checking is performed. Every 
object is tagged with its type and that information is carried through each computation step. 
The type of every new object (including scalars such as integers and floats) is computed along 
with its value and is attached to the value as its tag. Of course, computing types is a substantial 
overhead during program execution which is why we have chosen to perform dynamic type 
reconstruction instead of dynamic type maintenance. 

The Kernel Id language (Figure 7.1), its run-time execution model (Figure 6.2), and the 
type reconstruction algorithm (Figure 7.5) are all quite complex. In order to be able to argue 
about the correctness of the algorithm, we make several theoretical simplifications. These 
simplifications allow us to model these concepts cleanly and distill the basic characteristics of 
the reconstruction algorithm. 


7.4.1 Simple Expression Language and its Semantic Model 


As the first step, we restrict ourselves to the simple expression language described in Chapter 3. 
This is because we have already made a considerable effort to rigorously define the static and 
dynamic semantics for this language. We already have an operational semantic model for 
this language (Definition 3.12) and we have shown the consistency between the static and the 
dynamic semantics (Theorem 3.16). This consistency is the main tool using which we will show 
the correctness of our type reconstruction algorithm. It may be noted that the problem of 
complete type reconstruction is independent of the issue of parallel or sequential execution. 
Therefore, restricting ourselves to a strict, sequential language instead of dealing with the fully 
parallel execution model of Id does not affect the reconstruction algorithm or the issue of its 
correctness. 

It is easy to see the correspondence between the Kernel Id language shown in Figure 7.1 and 
the simple expression language shown in Section 3.1.1. Most Kernel Id expressions have direct 
analogue in the simple expression language. The important simplifications are that mutually 
recursive functions must be combined into a single self-recursive function, Case-expressions 
must be broken up into a series of conditional expressions and blocks must be converted into 
nested let-bindings of mutually recursive definitions. 


7.4.2 Partial Execution and the Dynamic Activation Tree 


The second step is to model the state of the machine at the moment when type reconstruction 
is requested. In the relational formulation of the dynamic semantics shown in Section 3.1.2, 
an evaluation of a top-level query expression may be described by a logical derivation tree of 
evaluation judgments of the following form:® 


eFa/s>v/s' 


The evaluation derivation tree for the top-level query provides a logical proof of how eval- 
uations of sub-expressions contribute towards the final result of the entire program according 
to the dynamic inference rules shown in Figure 3.1. We will now treat this derivation tree of 
evaluation judgment relations as representing the computation itself. The complete derivation 
tree for the top-level query corresponds to the entire program computation. Each judgment 


®We assume that the result of the evaluation is not err. This is because we assume that the entire pro- 
gram including the top-level query expression is type-correct. Therefore, by the Soundness Theorem 3.16, the 
evaluation can never run into a run-time type-error. 
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e0 |- (f3 = ...3. g3 =... g3 2V/s0 => _/_ 


e0 |- (f3 £3_hint_1 x =...)/s0 =S“<clsr Aue es 


e0+{f3 -> <clsr f3...>}=e1 |- (g3 =...;g3 2)/sO => _/_ 


e1 |- false/sO => false/sO “ ‘S 
e1 |- f8 (tc "bool" nil) (true:nil)/sO => <clsr h3...>/s0 o 
e1 |- 3/s0 => <clsr f3...>/80 a-7 Oe e2 |- g3/s0 - 2 <clsr h3...,e3>/80 
e1 |- (tc "bool" nil)/sO => <tc "bool" nil>/sO ‘ e2 |- 2/s0 => 2/s0 
- (true:nil)/sO => <cons true nil>/sO 


eee re | toh een, A oa gi | e3+{z-> 2}=e4 |- 
(if length x ==1 then... else ...)/sO => _/_ 


Figure 7.6: The Evaluation Derivation Tree for Example 7.3. 


in this tree may be considered as a providing a place-holder for the initial store and the final 
result (value and a new store) computed within that judgment. The store is sequentialized 
through the entire tree in a predictable depth-first fashion, while the values propagate from 
the leaves of the tree towards the root—the value of the top-level query being the value of the 
whole computation. Values may also be passed from one branch of the tree to the other via 
the environment. 

The overall process of evaluation may be viewed as a step-wise unfolding of the evaluation 
derivation tree. We start with the top-level query evaluation judgment using the initial dynamic 
environment, the initial store, and an empty place-holder for the result. In order to compute 
the overall result, the top-level evaluation judgment unfolds into a set of antecedent judgments 
needed by the dynamic inference rule that is selected according to the immediate structure of 
the query expression. Each such unfolding creates empty place-holders for the results of inter- 
mediate evaluation judgments. On reaching the leaves, values are created spontaneously using 
CONST, IDENT, or ABS rules, and are used to fill the place-holders for the leaf judgments. On 
each successive computation step, these values fill the place-holders of their parent judgments, 
until they reach an inference rule with multiple antecedents such as APP, TUPLE, or LET rules, 
in which case a new sub-tree of evaluation judgments is spawned. 

As an example of this process, consider the computation shown in Example 6.8 which is 
reproduced below: 


Example 7.3: 
def £3 f3_hint_1 x = 
{ def h3 h3_hint_1 z = if length x == 1 
then z:nil 
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else z:z:nil; 
in h3 f3_hint_1 }; 
g3 = if... 
then £3 (tc "int" nil) (1:nil) 
else £3 (tc "bool" nil) (true:nil); 
g3 2; 


This program has been translated according to the scheme presented in Section 7.2 with the 
appropriate type-hints added. The evaluation derivation tree for this computation is depicted in 
Figure 7.6. Each incomplete evaluation judgment in the derivation tree is expanded downwards 
into its antecedent judgments according to the dynamic inference rules of Figure 3.1. Not all 
branches of the derivation tree have been expanded yet. An empty place-holder (_) is used to 
represent an unknown value or a store within incomplete or unexplored judgments. In addition, 
we also collapse the sub-trees that have been fully evaluated (shown in light typeface) up to 
the highest completed evaluation judgment. 

Such a partially expanded evaluation derivation tree may be used to model the exact state 
of a computation at any given point in time: 


Definition 7.1 (Partial Execution Tree) A partial execution tree is a structural tree-prefix? 
of the complete evaluation derivation tree for the top-level query expression with the following 
characteristics: 


1. Each node consists of a possibly incomplete evaluation judgment of the form e+ a/s > 
v/s’. 


2. Sub-trees consisting entirely of complete evaluation judgments are collapsed into a leaf 
judgment e - a/s = v/s’ corresponding to the highest evaluation judgment that has 
received its value. These nodes represent terminated computation. 


3. Internal evaluation judgments e + a/s => _/_ that have been expanded but not yet fully 
evaluated contain empty place-holders (_) to receive their values. These nodes represent 
the active machine state. 


4. Unexpanded judgments e + a/_ = _/- are also represented by a leaf with empty place- 
holders. These nodes represent the computation to be spawned in the future. 


Note that if we model the store independently as an external data-structure rather than 
threading it sequentially through the judgments, we can model parallel computation within 
this framework by spawning several branches of the partial execution tree in parallel. The only 
modification needed in the current dynamic semantics to model this situation would be to use 
a least-upper-bound (U) operation on stores that would combine stores from various branches 
of the execution tree into a single store. 

It is useful to draw a correspondence between the actual dynamic activation tree at any given 
time during the execution of a program and its partial execution tree as described above. This 
may be seen by comparing Figure 6.6 that shows the dynamic activation tree for Example 7.3 
with Figure 7.6 that shows its partial execution tree. The following correspondences emerge: 


°A sub-tree starting at the root of the original tree with some of its branches clipped is called a tree-prefix of 
the tree. 
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1. The root frame of the dynamic activation tree corresponds to the root judgment of the 
partial execution tree which is evaluating the top-level query expression provided by the 
user. 


2. The type of the query expression is completely known at the beginning of the computation 
which corresponds to the fact that the root frame in the dynamic activation tree is always 
marked as reconstructed. 


3. Each activation frame present in the dynamic activation tree corresponds to a subset of 
evaluation judgments within the partial execution tree that belong to the body of the 
applied function and hang from its application evaluation judgment. In other words, 
each application evaluation judgment within the partial execution tree may be viewed as 
initiating a new activation frame for the applied function. 


4. Collapsing evaluation sub-trees for completed evaluation judgments corresponds to the 
fact that the activation frames within that branch of the computation have been deallo- 
cated and just the final value is available within the current frame. 


With the above correspondence in mind, the partial execution tree serves as an accurate 
logical model of the actual dynamic activation tree. 


7.4.3 Type Reconstruction 


Given the definition of the dynamic state of the machine as a partial execution tree, type 
reconstruction may be viewed as the process of computing the exact type of each value present 
in the partial execution tree at any given time. Using the formal machinery at hand, this 
corresponds to generating a type derivation tree using the static semantics inference rules 
shown in Figure 3.2, that parallels the structure of the given partial execution tree. This is 
captured in the following definition: 


Definition 7.2 (Type Reconstruction) Type reconstruction of a given partial execution tree 
is defined to be a type derivation tree with the same structure as the partial execution tree with 
the following characteristics: 


1. For each evaluation judgment in the partial execution tree of the form e+ a/s => v/s’, 
where s, v, and s' may be empty place-holders, the type derivation tree has a corresponding 
valid elaboration judgment of the form Eb a: 7. Furthermore, the type 7 is the most 
general type satisfying this elaboration. 


2. For each completed evaluation judgment of the form e+ a/s > v/s’ and the corresponding 
typing judgment E+ a:t, there exists a store typing S such that S Fe: BF andes:S. 


Using the Soundness Theorem 3.16, the second condition in the above definition immediately 
allows us to conclude that the computed value v is consistent with the type 7 under a suitably 
constructed new store typing. In addition, the first condition ensures that this is the most 
general type of the value v. Therefore, this type 7 is taken to be the exact reconstructed type 
of the value v. 
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7.4.4 The Type Reconstruction Algorithm 


The reconstruction algorithm shown in Figure 7.5 reconstructs one activation frame at a time, 
although it may be applied to each frame within the current dynamic activation tree to re- 
construct the whole state of the machine. The actual order in which frames are reconstructed 
is not important, nor is the fact that we cache the reconstructed type-maps. Therefore, we 
will assume that all frames in the dynamic activation tree are reconstructed in one sweep that 
starts at the root frame and works its way downwards towards the leaf frames. This does not 
change the correctness problem because we are interested in showing the correctness of what 
the algorithm computes, not necessarily how it computes it. 

As shown earlier, we have modeled the dynamic activation tree as a partial execution tree 
(Definition 7.1), and the process of type reconstruction as constructing a type derivation tree 
for it (Definition 7.2). Therefore, all we need to do now is to show that our type reconstruc- 
tion algorithm given in Figure 7.5 indeed constructs a valid, most general type derivation tree 
according to Definition 7.2. To accomplish this, we need to abstract the reconstruction algo- 
rithm in terms of traversing the partial execution tree and constructing the corresponding type 
derivation tree. 

The first observation to be made about the reconstruction algorithm shown in Figure 7.5 
is that it reconstructs an entire frame at a time by instantiating the static type-map of the 
function corresponding to that frame. The static type-map of a function corresponds to the 
most general, static type derivation tree of its body. This is because the static type-map records 
the compile-time type of every sub-expression and free identifiers occurring within the body of 
the function (Definition 6.1). Furthermore, these types are computed using the type inference 
algorithm Infer mentioned in Section 3.4. The soundness of this algorithm (Proposition 3.19) 
ensures that we can construct a valid type derivation tree for the entire body of the function, 
while its completeness (Proposition 3.20) ensures that we obtain the most general type for each 
sub-expression. 

Thus, instantiating the static type-map of a function with a substitution can be viewed as 
instantiating the entire static type derivation tree of the function body with that substitution. 
The validity of the derivation tree after substitution is ensured by the stability of typing judg- 
ments under substitution (Proposition 3.9). Note that the structure of the instantiated type 
derivation tree matches the portion of the partial execution tree that corresponds to the activa- 
tion frame being reconstructed. Sub-trees that are completely evaluated and hence have been 
collapsed to a leaf in the partial execution tree may also be collapsed in the typing derivation 
tree. 

The second observation about the reconstruction algorithm is regarding the construction 
of the instantiating substitution SdefuseShint9 copy for the callee’s type-map. The purpose of 
this substitution is to fully instantiate the static type-map of the callee according to the types 
of its actual arguments and the result, so that the corresponding type derivation tree for the 
callee’s body matches the application site in the caller’s derivation tree. The two independent 
components!? Sdef-use aNd Shing are responsible for two different sets of arguments supplied to the 
callee. The substitution Sgef-use conveys the type instantiation information due to the arguments 
and the result present at the final application site, while the substitution Sjjn¢ provides the 
instantiation information due to the arguments supplied at previous partial application sites. 

The compiler support for type-hint generation and propagation (Section 7.2) provides the 
mechanism by which we make the relevant type information available at the final application 


10The third component Scopy simply serves to make a copy of the type-map and therefore does not provide 
any additional information. 


150 


site. The most important property of this mechanism is type conservation (Definition 6.2) 
which ensures that the exact type instantiation for every type-variable within a function’s 
type-map is preserved at each of its application sites. For non-conserved type-variables, the 
type-hint generation and propagation phase described in Section 7.2.3 encodes their dynamic 
type instantiations at each partial application site and stores them within the returned closure. 
This ensures that these type instantiations would remain accessible in encoded form even when 
the computation that produced the closure has terminated. The substitution $);,; during type 
reconstruction represents these type-variable instantiations. For conserved type-variables, the 
type of the arguments present at the final application site within the type derivation tree of 
the caller provides their exact instantiation. The substitution component Sgepuse Captures 
these instantiations. Type conservation (Definition 6.2) guarantees that together these two 
substitutions fully instantiate all the type-variables present within the callee’s type-map in 
accordance with the types of the actual arguments and the result of the function application. 

As discussed above, the reconstruction algorithm ensures that the instantiated type deriva- 
tion tree computed for the callee’s body matches its application site within the type derivation 
tree of the caller’s body. This process effectively “glues” the instantiated type derivation tree 
of the callee’s body at the APP rule within the caller’s type derivation tree, producing a single 
typing derivation tree that structurally corresponds to the partial execution tree across this ap- 
plication. Below, we formalize the construction of the type derivation tree in the above manner 
and show its consistency with respect to the current partial execution tree. 


7.4.5 Correctness of the Algorithm 


We model the entire computation including the initial program loading/linking phase using a 
partial execution tree given by Definition 7.1. The program loading and linking phase construct 
the static and the dynamic environment within which the top-level query expression is evaluated. 
This is not part of the real reconstruction process because it is performed before initiating the 
execution of the top-level query expression. But, in our theoretical formulation it is simpler 
to start with empty static and dynamic environments, and an empty store that are consistent 
with each other by definition. 

Each loading/linking step adds a new let-binding to the partial execution tree and the 
type derivation tree, adding its type and value to the static and the dynamic environments 
respectively. Since each binding is well typed, it follows from the Soundness Theorem 3.16 that 
we end up in a store typing So such that each top-level binding value is consistent with its 
corresponding type, and that the static environment Fo, the dynamic environment €9, and the 
store 89 obtained after loading/linking are also consistent: 


So K €0: Eo and LE s9 : So (7.1) 


Now we are ready to show that the reconstruction algorithm of Figure 7.5 is correct, 7.e., 
given a logical partial execution tree as given by Definition 7.1, it computes the corresponding 
logical type derivation tree as given by Definition 7.2. 


Theorem 7.3 (Correctness of Type Reconstruction) The reconstruction algorithm shown 
in Figure 7.5, when applied to the complete dynamic activation tree at any time during program 
execution, produces the exact types for every value computed until that time. 


Proof: by induction on the size of the partial execution tree (Definition 7.1). Since the top- 
level query is guaranteed to be well-typed, we start with its type derivation tree. Looking 
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at the static inference rules shown in Figure 3.2 and the dynamic inference rules shown in 
Figure 3.1, it is clear that the structure of the type derivation tree must correspond to the 
partial execution tree except possibly at the APP or the ABS rules where the number of 
judgment antecedents differ between the static and the dynamic rules. We recurse down the 
partial execution tree and the current type derivation tree simultaneously in a depth-first, 
leftmost-first manner, arguing by case analysis on the inference rules that lead to completed 
evaluation judgments. 

Case 1: Rules other than ABS or APP — Equation 7.1 shows that we start with a consistent 
set of environments and an initial store. For each sub-expression that has been evaluated 
in sequence, the Soundness Theorem 3.16 guarantees that its value v; present in the partial 
execution tree would be consistent with the corresponding type 7; present in the type 
derivation tree. Furthermore, we can construct a chain of extensions to the initial store 
typing, S; extending $;_,; extending --- So, each of which would be consistent with the 
corresponding store s;, 5;-1,..., So. If any of these intermediate values entered the dynamic 
environment through a let-binding, then the static environment FE; and the dynamic 
environment e; so obtained would also be consistent by construction. Therefore, for each 
sub-expression evaluation judgment we have, 


S,; Eu it S; ke: £; EK s;: S; (7.2) 


Case 2: ABS Rule — Here, we simply clip the type derivation tree at the abstraction typing 
judgment in order to emulate the structure of the partial execution tree which produces a 
function closure immediately. The type-correctness of the function body ensures a consis- 
tent static type for the closure by definition of E (Definition 3.12). 


Case 3: APP Rule — This is the interesting case of type reconstruction. By induction hy- 
pothesis, the function and the argument expressions evaluate to a closure and a value 
respectively that are consistent with their types present in the type derivation tree. Fur- 
thermore, suppose the base function f present within the closure! has arity k with formal 
parameters 7,---2,. We need to consider two cases—partial application of the closure to 
one more argument, and the final application of the closure that generates a new activation 
frame. 


If the current application is a partial application of a closure (clsr f’, x,—i41, af, ef) to 


the value v,_j41, then it immediately produces another closure (clsr fi", wpeiae, af, ett) 
where e&*t+! = ee + {tp—i41 > Up—i4i}. The type consistency of this value with respect 


to the result closure type recorded in the type derivation tree follows directly from induc- 
tion hypothesis. The important point to note is that if some type-variable in the resulting 
closure type was not being conserved at this application site, then its exact type-hint would 
also have been supplied at this application site and stored within the closure environment 
ett 

Now suppose the function has already undergone & — 1 partial applications before this 


application to produce a function closure (clsr flap, ap,e¢'). Therefore, the dynamic 


‘The simple expression language of Chapter 3 does not deal with multi-arity functions directly. Therefore, 
we assume that each multi-arity function f with arity k in the user program gives rise to a set of functions 
f*, f*—!,..., f) that represent partially applied closures of f accumulating one argument at a time within their 
environments ce} . ee The superscript on the function f’ denotes how many more arguments are needed 
before the evaluation of the body of the function f is initiated. Likewise, the superscript 7 on the environment 


e? denotes the number of arguments it has accumulated. 
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APP rule in the caller’s partial execution tree looks like, 


e; F ay/s; => (clsr flyap,apee') (Sign 
e; F ag/8i414 => UE/Si42 
ce! + {ap > UR, f > (clsr fai, a;,e4)} F af /Si42 => -/- 
e; F (ay a2)/s; > _/- 


While the static APP rule in the caller’s type derivation tree constructed so far looks like, 


EF ay t Th > Tey Eb ao: Tp 
EF ay G2: Tey 


We wish to construct an appropriate type derivation sub-tree that models the evaluation 
of the callee’s body. 

From the induction hypothesis on the first two clauses and the Soundness Theorem 3.16, 
we obtain new store typings $;4, and Sj;42 such that, 


Sia F (clsr f'. tk, af, ef") Th > Troi = Sig extends S$; F sj4,:Si41 (7.3) 


Si42 = Up : Tk Si42 extends Sia F S42 Si42 7A 


Looking at the definition of (Definition 3.12), the first clause of Equation 7.3 guaran- 
tees that there exists a suitable type environment Ey that is consistent with the closure 


environment ee! and provides a proper typing for the function body. That is, 


k- k- 
Siar - ef): EE (7.5) 
and Ey ft (flwhere f(a,---%,) = a0): TR ThA 
= Ey + {xp H+ Tr } I~ GQ: Th4+1 (7.6) 


The job of the reconstruction algorithm is to construct the type environment ER! and 
hence build the exact type derivation tree of the function body as given by Equation 7.6. 


At compile-time, the static type-map of the function TM, has already recorded the 
static type of all the parameters and free identifiers of the function f (Definition 6.1). The 
reconstruction algorithm simply needs to instantiate this compile-time type environment 
Egat to compute the actual type environment E*7! as discussed in Section 7.4.4 above. 
In particular, the algorithm uses the exact type 7 of the final argument x; from the 
application site as well as type-hints contained within the closure environment et that 
allow it to compute the exact types of all the non-conserved type-variables in the type- 
map TM y+. This completely instantiates the types of all the accumulated arguments 7, : 


T1,+++,;%h-1 : Te-1 and the free identifiers contained within the closure environment et. 


Having constructed the type environment Ey as above, we can now instantiate the 
type derivation tree of the body ay as shown in Equation 7.6. Now it remains to be shown 
that this type derivation tree is consistent with the evaluation tree of the function body 
af. 

We have the following environments, 


eF = ee! + {rp vg} EF = Ey + {rp Th} 


Note that all argument and free identifier values contained within the closure en- 


vironment ee! must be consistent with the type present within the instantiated type 


153 


environment E*—! under the store typing Sj41, @e., the constructed environment ER} 
satisfies Equation 7.5. This is because these values have been computed in the earlier 
part of the evaluation tree which we have already type reconstructed and verified for 
consistency (Equation 7.2). Since the current store typing Sj42 extends $41, we have 
Sita FE ee! : Ey which is combined with Equation 7.3 and Equation 7.4 to give 
Si4a EF ef : ER. Together with Equation 7.4 and Equation 7.6, we obtain via the Sound- 
ness Theorem 3.16 that the evaluation of the function body ay will be consistent with its 
type elaboration. 


Thus, we have successfully reconstructed a consistent type derivation tree shown in 
Equation 7.6 for the expansion of the partial execution tree due to an arity-satisfied function 
application within the current frame. 
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Chapter 8 


Application Study: Tagless 
Garbage Collection 


In this chapter we study an important application of type reconstruction: Tagless Garbage 
Collection. We describe the compile-time and run-time support needed to perform garbage 
collection for a polymorphic language without any type-tags. We have implemented our scheme 
for the Id language running on a simulator for the *T multi-processor architecture. We describe 
this implementation and compare its performance with two other storage management schemes: 
first, a conservative garbage collector that does not use any type information, and second, a 
compiler-directed storage reclamation scheme that explicitly deallocates objects based on static 
life-time analysis. 


8.1 Introduction 


Dynamic memory management is an integral component of modern programming languages 
such as C, Common Lisp, Standard ML, and Haskell that support the notion of a globally 
shared heap of objects. It is possible to manage the heap memory manually by means of 
explicit allocation and deallocation calls, though manual storage reclamation is often a difficult 
and error-prone process. Usually, it is more convenient to use some automatic mechanism for 
storage reclamation such as an independent garbage collector that reclaims storage periodically 
once it is no longer in use. 

Traditionally, run-time systems geared towards automatic garbage collection use a tagged 
object representation model [App90, Wil92]. This enables the garbage collector to distinguish 
between scalar objects and pointers to heap objects without any support from the user or the 
compiler, although the user application has to pay the price of tagging and boxing objects and 
performing continuous tag maintenance. 

Recently, storage reclamation techniques with an untagged object representation model 
have received much attention. The motivation comes from a desire to use the full pointer 
addressability and native representation for scalars rather than a tagged representation, and 
to avoid the overhead of continuous tag maintenance. Some techniques, such as conservative 
garbage collection [Bar88, BW88] and compiler-directed storage reclamation [HJ92, Hic93], do 
not use any run-time type information. While, garbage collection based on type reconstruc- 
tion [App89, Gol91, GG92] or explicit type propagation [Tol94] use source type information for 
identifying and traversing live heap objects. In this chapter, we will study and compare the 
performance of some of these techniques with a scheme based on full run-time type reconstruc- 
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tion. 


8.1.1 Storage Reclamation without Run-time Type Information 


In an untagged run-time system, no explicit type information is available at run-time in order 
to identify and traverse live objects. Still, it is possible to perform garbage collection using 
a conservative object identification strategy as shown by Boehm and Weiser [BW88]. In this 
scheme, the garbage collector guesses whether a given value is a scalar or a pointer to a heap 
object. Typically, the guess is based on certain assumptions about the location and alignment 
of actual pointer data. Since the guess is conservative, the garbage collector may assume some 
objects to be live when they are dead and fail to collect them. It may also be possible to 
compact or copy part of the live data that is definitely known to reside on the heap as shown by 
Bartlett [Bar88]. The feasibility and efficiency of such schemes depend crucially on the object 
representation convention used within the run-time system and the possibility of obscuring 
pointer/non-pointer information within the source language and the compiler. 

In another scheme proposed by Hicks [HJ92, Hic93], the compiler performs life-time analysis 
of objects and automatically inserts explicit deallocation calls for an object that is determined 
to be dead at a particular point in the program. The compile-time cost of this analysis is 
substantial since the proposed scheme performs abstract interpretation over the entire program 
in order to determine the reference patterns of dynamically allocated objects and to approximate 
their life-times statically. Although, once an object has been determined to be garbage, the run- 
time cost of deallocating it at an appropriate program point is minimal. Since static analysis 
is necessarily approximate due to undetermined control flow and sharing or aliasing of objects, 
this technique is also unable to reclaim all the garbage generated within the program. 


8.1.2 Garbage Collection using Run-time Type Reconstruction 


The primary motivation for a type-reconstruction-based garbage collection scheme is to take 
advantage of the enormous compile-time type information available in a statically-typed lan- 
guage in optimizing its run-time performance. In particular, it is possible in such a system to 
use an untagged and unboxed representation for scalar objects and eliminate type headers for 
heap objects without compromising the ability to perform complete object identification. All 
the desired type information may be automatically reconstructed when necessary. Although 
the cost of type reconstruction may be significant, it needs to be paid only when garbage col- 
lection is initiated. Therefore, such a scheme may work very well for scientific applications 
where numerical performance is of prime concern and garbage collection is expected to happen 
infrequently and is used in conjunction with explicit storage management. Keeping tagless 
data also permits easy inter-operability with conventional C and Fortran libraries that do not 
support tags. 

Full run-time type reconstruction also offers some unique advantages that are not present 
in other schemes for storage reclamation. Having the exact run-time types of objects allows the 
garbage collector to examine and traverse objects selectively. For example, the collector need 
not search for heap pointers inside a large array of floating point numbers. Similarly, the scalar 
fields of a record may be safely skipped. For scientific applications manipulating large numeric 
arrays, this may constitute a substantial saving in identifying the set of all live objects. 

It is also quite easy in this scheme to generate specialized traversal and marking functions 
for user-defined objects and function activation frames that understand their type and control 
structure. These functions selectively traverse the fields that point to heap objects as determined 
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by their types, and mark those objects as live. Since these functions are specialized to the type 
of a particular object, they may be more efficient than interpreting the run-time reconstructed 
types of the objects. 


8.1.3. Related Work 


Goldberg and Gloger used type reconstruction to garbage collect a polymorphic language 
[GG92]. But their system did not guarantee complete type reconstruction. In a situation 
where a polymorphic function accessed only part of a complex object (see Section 6.1.2), say 
the spine of a linked list, their system could not determine the full type of the object and 
therefore could not traverse it completely. The authors argued that the inaccessible parts of 
the object were garbage anyway and therefore need not be marked as live. Unfortunately, 
the object could have shared references from other sources that access it farther than the first 
reference. To deal with such cases, the authors proposed maintaining hash tables of partially 
traversed data-structures as a way of identifying the extent to which an object was live and 
therefore should not be garbage collected. This scheme was both cumbersome and costly. On 
the other hand, our scheme of full type reconstruction allows the garbage collector to traverse 
the whole object the very first time without using any additional data-structures. 

Another interesting scheme has been proposed by Tolmach [Tol94] where type instantiation 
and propagation is made explicit in the program by converting it into an intermediate form based 
on the second-order A-calculus [Rey74, HM93]. Under this transformation, every polymorphic 
object is parameterized with explicit type parameters for each of its polymorphic type-variables 
that are instantiated at the time of application to actual type arguments. This explicit run-time 
type information is used during garbage collection in much the same way as in our scheme. A 
minor problem in using this scheme is that in order to preserve the call-by-value semantics of 
ML-like programs, the polymorphic objects appearing on the right-hand-side of a let-binding 
must be restricted to syntactic values, t.e., identifiers, constants, or A-expressions. Wright 
showed [Wri93] that this restriction is not too serious in practice. 

The explicit type parameters used in Tolmach’s system are similar in spirit to the explicit 
type-hints of our type reconstruction scheme, although we add explicit type parameters only 
for non-conserved type-variables. Our scheme can be considered as an optimal trade-off point 
between Goldberg’s scheme where no explicit type information is propagated at run-time, and 
Tolmach’s scheme where all polymorphic type-variables are instantiated using explicit run-time 
parameters. We insert explicit type parameters only where necessary assuming that the cost of 
reconstructing the remaining information at run-time is small. 


8.1.4 Goals and Scope of the Study 


The main goal of this study is to establish the feasibility of a type-reconstruction based tagless 
garbage collection scheme (TRGC) and to compare its performance with a conservative garbage 
collection scheme (CGC) and a compiler-directed storage reclamation scheme (CDSR) that does 
explicit deallocation. 

In order to make a reasonable performance comparison, we have implemented all the three 
schemes for the same source language, compiler, and the target architecture. Our source lan- 
guage is Id, which is a polymorphic, strongly-typed, implicitly parallel programming language 
[Nik91]. We are compiling Id for the *T multiprocessor architecture [NPA92, PBGB93] and 
executing it on an emulator for that machine. 

We have chosen a very simple “mark-and-sweep” garbage collection algorithm so that the 
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cost of object identification can be clearly identified during the mark phase. The wall clock 
performance of the garbage collection algorithm is not our major concern, we are primarily 
interested in the relative cost of type reconstruction and marking vs. the cost of conservative 
marking. Explicit allocation /deallocation scheme serves as a calibration point representing the 
essential cost of managing the storage. 


8.1.5 Outline 


The outline of the rest of the chapter is as follows. Section 8.2 describes the object represen- 
tation model in Id and summarizes the overall strategy for mark-and-sweep garbage collection 
based on run-time type reconstruction. Section 8.3 describes the compiler support required. 
In Section 8.4, we describe the run-time object marking schema based on complete type re- 
construction. In Section 8.5, we briefly describe the *T multi-threaded architecture and our 
implementation of the various storage management schemes on it. Section 8.6 discusses our 
benchmarks and presents the performance results. Finally, Section 8.7 presents the conclusions. 


8.2 Framework for Tagless Garbage Collection 


8.2.1 Object Representations and the Memory Model 


The Kernel Id intermediate language as shown in Figure 7.1 is an abstract intermediate form 
that does not take a position on the underlying representation of objects. However, a concrete 
implementation of a language must specify a representation of objects, which to a large extent, 
determines its run-time performance and the garbage collection strategy. In this section, we 
describe the concrete representation of Id objects for our current implementation. 

The object representation used in the Id run-time system is independent of the target 
architecture and only relies upon the assumption of a logically flat, shared, global address space. 
In order to keep the representation simple and efficient we avoid making any assumptions about 
boxing and explicit tagging of objects as much as possible. The only assumption necessary to 
support polymorphism is that we use the same basic unit of memory for all scalar objects and 
pointers to heap objects which in our case is a single 64-bit word. 

Examples of various Id object representations appear in Figure 8.1. Scalar objects are by 
definition untagged and unboxed in Id. n-dimensional arrays are linearized in row-major order 
into a flat data-structure that also keeps the bounds in each dimension (l1, u1),..., (tn, Un) 
and a set of linearization constants co,...,€j—1 that are used to compute the linear offset into 
the array given a n-dimensional index. For an algebraic datatype, depending on the total 
number m and the arity k,, of its various disjuncts, we may choose one of product, enumerated, 
implicit, or explicit representation. In all cases except when there are more than one non- 
nullary disjuncts present, we are able to choose an unboxed and untagged representation for 
the datatype. In particular, when there is exactly one non-nullary disjunct present, as in the 
case of the list datatype, we assume that heap pointers can be distinguished from a small fixed 
range of integers (say, 0-255), sufficient to represent all the nullary disjuncts of the datatype 
and no explicit tag is necessary. For some applications, this may save a lot of space and time. 

There are two more kinds of objects that are created and manipulated indirectly at run- 
time by Id programs. These are function closures and activation frames. In an implementation 
without lambda-lifting and currying, function closures keep the values of the free identifiers 
of a function obtained from its lexical environment. In our implementation, all functions are 
already lambda-lifted, so the closures carry just the curried arguments accumulated under 
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Scalar: 6847 3.14 (Unboxed and Untagged) 
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(>1 non-nullary disjuncts) 0 En 


Function Closure: 
def F xl 


(F x1l...xk) 


Activation Frame: 
Size Args Locals 


eL [y feafe--fxnf |---| | 


Return Cont. 


Figure 8.1: Run-time Object Representations for Id. 


partial applications. We use the structure depicted in Figure 8.1 which permits sharing of 
intermediate closures. 

An activation frame is a temporary storage area used by an executing function as a scratch 
pad keeping its input arguments and temporary intermediate values. In Kernel Id, the bound 
variables of a function constitute the intermediate values that need to be kept within its acti- 
vation frame for future use.! The frame also keeps the return continuation, consisting of the 
caller’s activation frame and the return instruction pointer. In a sequential system, activation 
frames are usually allocated on a stack. In our parallel execution model, the linear stack of 
activation frames generalizes to a tree and is managed explicitly by the run-time system. 


1 An intelligent compiler back-end may be able to share some frame slots based on live-variable analysis, but 


we are ignoring that issue here for simplicity. 
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8.2.2 Overall Strategy 


The overall strategy for a simple mark-and-sweep garbage collection based on run-time type 
reconstruction is summarized below and described in the following sections: 


1. At compile-time, we ensure that every object manipulated by the user program (including 
function closures and activation frames) is assigned a static, possibly polymorphic, datatype 
that accurately describes the structure of that object (Section 8.3). 

2. When the garbage collector is invoked at run-time, first we reconstruct the type of every 
activation frame present within the current dynamic call tree using the algorithm described 
in the last chapter. The reconstruction mechanism instantiates the compile-time type de- 
scription of each activation frame to its exact run-time type. 

3. Next, within the mark phase of the garbage collector, each slot of a reconstructed frame 
is examined and its reconstructed type is used to mark the heap objects reachable from 
that slot as live. This may be done in two ways: the reconstructed types may be directly 
interpreted to identify and traverse the heap objects, or the compiler may automatically 
generate specialized traversal and mark routines that are appropriately composed at run- 
time in order to mark the live objects (Section 8.4). 

4. Finally, the unmarked heap objects are reclaimed as garbage by sweeping the entire heap. 


8.3. Compiler Support for Object Identification 


8.3.1 Visible and Invisible Datatypes 


The scalar basetypes, algebraic datatypes, and array types in Kernel Id correspond to pure data- 
objects whose types are directly visible at the source language level. There is a direct, fixed 
mapping from the source types of these objects to their internal representations as described in 
Section 8.2.1. This mapping may be directly used in traversing these objects at run-time once 
their exact source type is determined. 

On the other hand, arrow types (—) correspond to two different run-time objects: function 
closures which behave like data-objects that must be garbage collected, and activation frames 
which are control-objects consisting of the live object root set. Neither of these is modeled 
completely by the source-level arrow type. This is because the visible type signature of a 
function does not provide any clue regarding the types of the arguments hidden inside its 
closure, nor does it provide any information about the local variables kept within the function’s 
activation frame. In order to treat all Id run-time objects uniformly in terms of Id source 
types, we define invisible source-level datatypes for function closures and activation frames 
that provide an exact description of their contents. 


8.3.2 Modeling Function Closures 


In order to simplify the type reconstruction analysis, we model the closures corresponding to 
partial applications of a function as disjuncts of an invisible algebraic datatype that is auto- 
matically derived at compile-time from the corresponding function signature. This derivation is 
shown in Figure 8.2. The various disjuncts of this hidden datatype represent successive partial 
applications of the function and identify the number and the types of the accumulated argu- 
ments. This indirect model captures all the necessary type information required to traverse the 
actual run-time representation of a function closure as shown in Figure 8.1. Given a run-time 
closure object, we can map it to an algebraic disjunct in this model by examining its function 
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code-block pointer and the remaining arity slot. Then, given the exact algebraic type of the 
closure, the arguments contained within the closure can be traversed using the argument types 
of the mapped disjunct. 

As an example, below we show a function eqlen that compares the length of two lists. We 
also show its Hindley/Milner visible source type and its automatically derived hidden closure 
datatype: 


Example 8.1: 
def eqlen 11 12 = % eqlen :: VaZ.(list a) > (list 3) > bool 
{ leni = length 11; 
len2 = length 12; 
p = leni == len2; 
in p }; 


type eqlen_closure a 8 = & Hidden Closure Type 
eqlen_apg 
| eqlen_ap, (list a); 


f = eqlen (1:nil); 4 £ :: VG.(list 3) + bool 
4 £ ::VG.(eqlen_closure int /3) 


The constructor eqlen_app models the closure representation of the eqlen function itself, 
while eqlen_ap; represents the closure formed by a partial application of the eqlen function 
to one argument. The example also shows the source type and the invisible type of a partial 
application of the eqlen function.” Note that the invisible type records the fact that the hidden 
first argument within the closure is a list of integers while this information is not present in the 
source type. 

There is no need to make a closure for eqlen with two arguments since at that point its 
arity is fully satisfied and the application gives rise to an activation frame instead of a function 
closure.? Finally, note that the invisible closure datatype is parameterized by all the type- 
variables present in the source type of the function. This is necessary in order to model the 
exact run-time types of all the arguments contained within the closure. 


8.3.3. Modeling Activation Frames 


Function activation frames are modeled using an automatically derived, invisible datatype 
called the function framemap as shown in Figure 8.2. This is simply a record datatype with 
a field for every actual frame-slot (c.f. Figure 8.1). Besides the scalar datatype fields for the 
code-block entry point, the frame size and the return continuation, the framemap record the 
types of the function arguments and the local identifiers used within the function body. 

Abstractly, the framemap of a function provides a logical subset of the type information 
recorded within its type-map (Definition 6.1) and is parameterized by the same type-variables. 
The framemap simply provides a concrete static image of a function’s dynamic activation frame 
and therefore may depend on its actual implementation on a given platform. After type recon- 
struction is complete, each activation frame is associated with a fully instantiated type-map 
from which an appropriate framemap instance can be derived in order to traverse the heap 
objects accessible through each frame-slot.4 


24” is the infix cons constructor for lists. 


°However, under delayed or lazy evaluation, we may need to keep track of such thunks. 
‘In our current implementation, the type-map produced by the Id compiler is tailored to the structure of 


161 


INVISIBLE DATATYPES 


Given a Function Declaration: def fF 2,---2, =F 
Poi Vauy ++ Om-Ty > Tr > Trt 
Let (21 1:01) +++ (2m 2! Om) be the locally bound identifiers of EF. 


1. Define Function Closure Datatype: 
type F_closure a ,---Q, = F apy 
| Fap, 71 
| oe 


| Pap, T1-*+Tn-13 


2. Define Function Framemap Datatype: 
type F_framemap a, ---a, = 


{record (I :: code) 4 Code-Block Entry Point 
(N :: int) 4 Frame Size 
(R :: cont) %4 Return Continuation 
(a1 2: 71) % Arguments 


(Bn Tn) 
( 


Zz 101) %, Local Identifiers 


(2m t Om) }3 
Figure 8.2: Automatic Derivation of Invisible Datatypes. 


As an example, we show the framemap datatype for the eqlen function given above: 


Example 8.2: 
type eqlen_typemap a § = 


{record 

Ceqlen :: code) 
(size 2: int) 
(retcont :: cont) 
(11 :: (list a)) 
(12 :: (list 3)) 
(lent 2: int) 
(len2 2: int) 

(p :: bool) }; 


8.3.4 Run-time Type Encodings 


Run-time type reconstruction requires an encoding of all the visible and invisible datatypes of a 
program that is used to encode type-hints and to represent the exact run-time types of objects 
during type reconstruction. We showed such an encoding and decoding scheme in Figure 7.3 in 
Chapter 7. In this scheme, each algebraic datatype 7” is encoded into a corresponding static 


the activation frames used in the *T run-time system. Therefore, we directly use the type-map of a function to 
traverse its activation frame. 
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type descriptor T” that contains all the necessary compiler information about its arity, internal 
field structure, and its representation. 

Our compiler generates static type descriptors for all the user-defined algebraic datatypes 
and the automatically derived closure and framemap datatypes (Figure 8.2) for each declared 
function within the program. These static descriptors are linked together with the object 
program and are used by the run-time system during type reconstruction. Run-time types 
are encoded as a flat array of static type descriptors using back-pointers to preserve sharing. 
This representation permits very efficient copying, unification, and instantiation operations on 
encoded types. The packing and unpacking of these encoded types is carried out on the fly 
within the run-time system. 


8.4 Run-time Object Traversal and Marking 


In this section, we describe our scheme for object traversal and marking assuming complete 
type reconstruction has been performed. We present two mechanisms: 


Interpreted Marking - In this mechanism, the encoded types generated by type reconstruc- 
tion are directly used to guide the traversal and marking of the heap objects. 


Compiled Marking - In this mechanism, the compiler automatically generates marking func- 
tions for each datatype in the program based solely on the static type information. These 
functions are appropriately composed at run-time using the reconstructed types and then 
directly applied to the corresponding objects. 


Both mechanisms are specified as a set of mark functions, one for each basetype, array type, 
and algebraic datatype present in the program. The algebraic datatype could be a user-defined 
datatype (Figure 7.1) or an invisible datatype defined by the compiler for function closures and 
activation frames (Figure 8.2). 


8.4.1 Interpreted Marking 


The Interpreted Marking Schema M for a type T” is shown in Figure 8.3. In this schema, 
for each type 7” with n type parameters a,---a,, we define a mark function mark_T that is 
parameterized by n corresponding encoded type arguments z,---z,. At run-time, this function 
is supplied with the exact encoded type instantiation of its type parameters, say 7, ---7,, which 
produces an appropriate marking function for an object with type (7" 7 ---T,). 

The internal structure of the mark functions closely follows the structure of their corre- 
sponding datatypes. The polymorphic, bound type-variables of a type-scheme are mapped to 
dummy mark functions because polymorphic objects contain no information. Similarly, all our 
base types are scalars, so the mark functions for them do nothing. The mark function for ar- 
rays and algebraic datatypes first mark the object itself and then proceed to mark their internal 
components. This is achieved by first computing the exact run-time type encoding for each of 
the components and then interpreting that encoding. The code to compute the exact type 
encoding is directly compiled into the mark functions using the TEne[] scheme shown earlier 
(Figure 7.3). 

The overall process of interpretive marking is governed by the top-level type-code interpre- 
tation function shown in Figure 8.4. Here, we have generalized the type-code interpretation 
scheme Interpret|] for an arbitrary datatype schema R such as the marking schema M of 
Figure 8.3. This process unpacks the encoded type and invokes the schema function for the 
appropriate type descriptor passing it the rest of the encoded type arguments. In the present 
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MARKING SCHEMA M 


Given a polymorphic type-variable a;, define M([T? ] = mark_T,,,, where 
def mark_T’,, () = Az.() 


Given a Type 7”, define M[T”] = mark_T, where 


1. T° is a BaseType (int | float): 
def mark_T () = Az.() 


2. T! is an ArrayType (nd_array a): 
def mark_nd_array (z) = 
Aa.{ Mark(a); 
(41, U1),---, (Un, Un) = bounds(a); 
for 23 «+ I, to u; do 


fori, < l, tou, do 
Interpret[M] (TEncla] {a > z}) afti,..., tn]; 
} 


3. T” is an Algebraic DataType (T” ay ---a,): 
def mark_T’ (%,...,2n) = 
Aw.{ Mark(«); 
Case_T’ x of 
Cy a ---+a,, = { Interpret] M] (TEnc[ 71,1] {a; 6 z;}) 241; 


Interpret[M] (TEnc[1;,] {a 2: }) ve; } 


| Cm %1-++&,,, = { Interpret[M] (TEnc[ tii] {ai 6 z}) 71; 


Interpret[.M] (TEne[tnz,,] {ai 0 z}) thm } 


Figure 8.3: Generating Mark Functions for Datatypes. 


Given a Datatype Schema F, define 


Interpret[R] 7 = { Case head(T) of 
Th = (R[L]) args" (F) 
| Ty ' (RITS"]) args™ (7) 


Figure 8.4: Type-code Interpretation at Run-time. 
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Given a Datatype Schema FR and a Translation Environment , p, define 


Compile[R] , p a = Ra) 


Compile[R] , 2 (1" m1-++T) (R[T"]) (Compile[R] , 2 1,..., Compile[R] , p T,) 


Compile[R] ,R Vo ,---a,.7 = Compile[R] , z (7[T2,/ai]) 


Figure 8.5: Type-based Translation at Compile-time. 


case, (Interpret[M] 7 x) traverses and marks the object x according to its exact run-time 
type encoding 7 by recursively instantiating and invoking the mark functions associated with 
the type descriptors in 7. Other structured datatype schema such as a printing schema or an 
I/O schema may also be defined and interpreted in a similar manner. 

In our current implementation, the type-code interpretation mechanism of Figure 8.4 is built 
into the run-time system. The marking process is invoked for each type-reconstructed activation 
frame present in the dynamic activation tree. The run-time system constructs the exact run- 
time type encoding of every frame-slot in the given activation frame and then directly dispatches 
to the appropriate marking function based on the datatype class as specified in Figure 8.3. The 
marking process is further optimized based on the actual representation chosen for a particular 
class of datatypes as shown in Figure 8.1. For example, the marking function for linearized 
arrays computes the total size of the array and marks each of its elements in a single loop. In 
case of algebraic types, nullary disjuncts under enumerated or implicit representation are never 
marked, a product disjunct is always marked, and a tag dispatch is made for explicitly tagged 
disjuncts. Finally, the hidden arguments inside function closures are traversed and marked 
according to their reconstructed hidden closure types. 


8.4.2 Compiled Marking 


Rather than interpreting type encodings as in the interpreted marking schema, it is also pos- 
sible to generate compiled marking functions for each datatype that know how to traverse the 
object directly without any type interpretation. In this Compiled Marking Schema M', for each 
datatype T” the compiler automatically generates a mark function mark’_T that is parameter- 
ized by n mark function arguments f, --- f, instead of encoded type arguments. This alternate 
marking schema M’ can be directly obtained from our interpreted marking schema M shown 
in Figure 8.3 by replacing the recursive call for interpretation: 


Interpret[M] (TEnc[7T] {a; + 2; }) 
by a type-based function composition: 
(Compile[M’‘] {a; 5 fi} 7) 


This transformation expresses the fact that building the exact run-time type encoding of an 
object and then interpreting it to guide the traversal and marking is functionally equivalent 
to directly traversing it using a compiled marking function that knows the structure of that 
object. 

The general mechanism of type-based function composition (Compile[R] , 2 7) for an 
arbitrary schema R (such as the compiled marking schema M’) is shown in Figure 8.5. Gen- 
erating compile-time type encodings as shown in Figure 7.3 may be thought of as a special 
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case of this mechanism. This mechanism translates a given static type 7 into a composi- 
tion of schema functions specified by R under a translation environment , p that maps free 
type variables of 7 to schema-dependent values. For the case of compiled marking schema, 
(Compile[M’] {a; > f;} 7) creates a function composition that is capable of marking an 
object whose type is a run-time instance of the static type 7. Note that the marking function 
so generated does not contain any type-code interpretation. Its execution directly results into 
the appropriate traversal and marking of the given object. 

The compiled marking process is initiated by converting the reconstructed type-map of each 
activation frame into a composition of compiler-generated marking functions. This translation 
is similar to the type-based function composition shown in Figure 8.5 except that it operates 
on type encodings rather than static types. The resulting function composition may be directly 
applied to the given activation frame to mark all heap objects reachable from that frame. The 
compiled marking schema is currently unimplemented. 


8.4.3 Variations on Marking Schemes 


The interpreted and the compiled marking schemes described above are just a few among a 
full spectrum of possible marking schemes that depend on the degree of type specialization 
performed at compile-time and degree of type interpretation performed at run-time. For in- 
stance, it is possible to have a marking schema that takes an intermediate position between the 
completely interpreted schema M and the completely compiled schema M’. In this schema, 
calls to the top-level interpretive dispatch (Figure 8.4) may be statically specialized to call the 
marking functions of schema M™ directly, although dynamic type-hints may still have to be 
interpreted at run-time. 

It is also possible to specialize the type-hint propagation and the type reconstruction mech- 
anism described in the last chapter (Section 7.2 and Section 7.3) for the explicit purpose of 
object marking. In this scheme, the compiler would insert code to generate and propagate 
type-hints (Section 7.2.3) that consist of compositions of mark functions rather than run-time 
type encodings. The type reconstruction algorithm (Section 7.4.4) would also be modified to 
deal with such type-hints and the algorithm would return a higher-order composition of mark 
functions for the given activation frame rather than a reconstructed type-map. The mark func- 
tion so obtained would be directly applied to the activation frame to mark all heap objects 
accessible from it.° 

An independent variation for any of the compiled marking schemes is to generate as many 
specialized marking functions as possible at compile-time for every static type occurring in the 
program rather than generating compositions of a fixed set of datatype marking functions as 
shown above. This would clearly reduce the overhead of using higher-order marking functions. 


8.5 *T Implementation 


*T is a parallel, distributed-memory machine with a high performance interconnection net- 
work [NPA92, PBGB93]. The *T architecture extends a basic RISC instruction set with 
low-overhead, user-mode communication and synchronization primitives. The details of the 
architecture may be found elsewhere [Bec92]. In this section, we briefly summarize some of the 


°Readers familiar with Haskell’s type classes [HWe90, WB89] would immediately recognize that in Haskell, 
we can accommodate all variations of type reconstruction and its applications by declaring a universal class trec 
that provides type encodings, mark functions, print functions etc. as independent methods. 
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design features and the terminology of the *T architecture that are relevant to the implemen- 
tation of Id on *T and then describe our implementation of distributed garbage collection on 
this machine. 


8.5.1 Multi-threaded Execution: Processor View 


In our study, we used a simulator for the *T architecture based on Motorola’s 88110MP pro- 
cessor. The 88110MP is a super-scalar RISC processor extended with an on-chip message and 
synchronization unit (MSU) which provides hardware support for scheduling microthreads. A 
microthread is a compiler-defined sequence of instructions executing within the context of an 
activation frame. A microthread descriptor identifying a microthread consists of an instruction 
pointer (IP) and a frame pointer (FP) (refer Figure 6.2). A microthread, by definition, executes 
to completion once it has been invoked. It may send messages or fork other microthreads that 
are deposited in a stack of ready-to-run microthreads. 

*T processors communicate with each other by sending messages via the network. Messages 
consist of 4 to 24 32-bit words. Due to the on-chip message unit, *T messages may be dis- 
patched and handled very quickly using the general-purpose processor registers directly (6 and 
12 instructions respectively for a full-sized message). Messages always contain a microthread 
descriptor as the first two words of payload. Normally, messages are handled by invoking the 
microthread described within the message, so these microthreads are termed message handlers. 

A microthread’s last operation is to schedule the next microthread of the highest priority 
which is selected from a simple priority queue consisting of handlers of incoming messages, the 
microthread stack, and several microthread registers. Message handlers have higher priority 
than computation microthreads. 


8.5.2 Multi-threaded Execution: System View 


*T runs a Unix-like operating system. A parallel job running on *T consists of a separate 
process, or a player, on each processor. Players belonging to the same parallel job are scheduled 
at the same time on their respective processors by the operating system. The players have 
independent 32-bit virtual address spaces, but may refer to a global 64-bit address space through 
the MSU by sending messages to each other. 

The Id compiler and its run-time system for *T provide the high-level abstraction of a 
single, implicitly parallel program running within a shared, global address space as shown in 
Figure 6.2. The Id compiler statically partitions the user program into several microthreads that 
are scheduled dynamically during execution. Microthreads communicate and synchronize with 
each other via messages. Microthreads belonging to a single Id procedure execute within the 
context of a shared activation frame and may also communicate with each other via the frame. 
Since successively scheduled microthreads on a processor may be completely independent, the 
general-purpose registers within the processor are kept local to a microthread and are not 
used to communicate data across microthreads. However, registers may still be used to pass 
parameters to C functions called within a single microthread. 

The Id run-time system consists of the frame manager, the heap manager, and protocol 
handlers for I-structure and M-structure memory operations [CCF*93]. All run-time system 
calls are initiated and serviced as split-phase transactions. A microthread sends a message to a 
run-time system request handler passing it the descriptor of a microthread that would receive 
the reply. The request handler services the request and returns the result in a message to the 
reply handler provided with the request. This scheme ensures that computation microthreads 
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never block the processor pipeline and can always run to completion.® This invariant guarantees 
that run-time system exceptions such as running out of frame or heap memory always happen 
at the boundary of a computation microthread. At that moment, none of the general-purpose 
registers contain any live data and the complete root set of heap objects is available within the 
tree of activation frames. 

The Id run-time system sets up the players participating in a parallel job to continuously 
execute a microthread dispatch loop where microthreads are scheduled according to the priority 
scheme described earlier. One of the players (processor 0) is setup to allocate the root activation 
frame and launch the first microthread along with its user-supplied arguments. It also receives 
the final result and coordinates the termination of the parallel job. 


8.5.3 Memory Organization 


For the purpose of executing Id programs, the *T machine is logically divided into two 
kinds of nodes: computation nodes and memory nodes (see Figure 8.6). The computation 
nodes manage the dynamic tree of activation frames and execute computation microthreads 
while the memory nodes manage the heap memory and handle various protocols for memory 
references. 

The address space of a player running on a *T processor is divided into several areas that 
are themselves distributed or replicated across the nodes as shown in Figure 8.6. 

The code and static data areas are replicated on all nodes — each node gets a copy of the 
whole program and all of its constants. Each node also has a stack that is used for calling into 
C procedures from Id. The Id run-time system is implemented in C and may also use the C 
stack. 

The frame area on the computation nodes contains the activation frames for every Id pro- 
cedure invocation. When a procedure is invoked, the run-time system chooses a processor on 
which to allocate its frame according to a built-in load balancing strategy. Then, the run-time 
system sends a frame allocation request to that processor in a split-phase transaction, which 
allocates a frame in its own frame area and returns a pointer to it to the calling routine. This 
mechanism distributes the dynamic tree of activation frames across all the computation nodes. 

An activation frame is deallocated by the last microthread of its associated procedure and 
may be reused subsequently. In order to avoid confusion due to stale data lying around from 
previous allocations, the Id compiler arranges the first microthread of each procedure to clear 
all frame-slots that may contain pointers. This helps in identifying valid data within the frame 
during garbage collection. 

The heap area on the memory nodes contains all of the heap-allocated Id objects. The heap 
area is further divided into the interleaved and the non-interleaved area. The non-interleaved 
area is used for small sized objects contained wholly within the same node, while the interleaved 
area is used to allocate large objects that are spread across all the memory nodes to avoid 
allocation imbalance and reduce memory contention. In order to simplify our study, we only 
used the non-interleaved heap area. 

In our implementation of Id on *T, all scalar objects and pointers to heap objects are 64 
bits in size. Furthermore, these pointers are always aligned on 8-byte boundaries when stored 
in memory. Each 64-bit double word in the heap has an associated 2-bit presence value in the 
presence-bit area. These presence bits are used to implement Id’s I-structure [ANP89], and 
M-structure [BNA91] synchronization operations. 


°If the network is blocked, the message is buffered and is tried again at a later point. Thus, the currently 
executing microthread is guaranteed to terminate without blocking. 
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Figure 8.6: The Organization of Computation Nodes and Memory Nodes in the *T machine. 


We also use the non-interleaved heap area to keep any deferred-read and locked-take con- 
tinuations for the structure and M-structure operations respectively. These continuations 
represent incomplete split-phase memory accesses whose second phase would complete when 
the corresponding heap data becomes available. Therefore, the heap objects carrying these 
continuations are always considered to be live and should never be garbage collected. On the 
other hand, since our system does not perform tail-calls, pointers to activation frames con- 
tained within such continuations are always accessible through the dynamic tree of activation 
frames. Therefore, these continuations do not have to be scanned for live pointers. Currently, 
our run-time system permanently marks such objects as live and manages their allocation and 
deallocation separately. Also, the garbage collector treats their contents as scalar data. A 
cleaner solution would have been to designate a separate heap area for allocating such deferred 
continuations so that the garbage collector never sees them. 

Our compiler and run-time system never store a pointer to the interior of an object in a 
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frame-slot or another Id object. Therefore, a pointer found within a frame or a heap object 
always points to the head of the active area of the object. The active area of the object is 
actually preceded in memory by some information managed by the run-time system including 
the object’s size (used for deallocation), a mark-bit (used by the garbage collector), and the 
time when it was allocated (in instruction cycles — for statistics collection). 


8.5.4 Garbage Collection on *T 


Garbage collection on *T can be initiated either by request from the Id program or by the 
run-time system when one of the processors finds out that it is running out of heap storage. 
Our current policy is to initiate garbage collection when the allocated storage on a node reaches 
a specified fraction (say, 0.75) of its total storage. 

Since the heap is shared globally, all processors must participate in a global garbage collec- 
tion. Therefore, when one processor decides to do garbage collection, all other processors are 
informed about it. Currently, we have implemented a simple stop-and-collect garbage collection 
scheme. 

First, the computation nodes stop processing computation microthreads and drain all mes- 
sages out of the network because the messages may carry live pointers to heap objects. As 
messages are drained from the network, their handlers are invoked. Our compiler ensures that 
the computation message handlers may modify memory locations or fork other microthreads, 
but they are not allowed to send more messages.’ We can handle all messages and eventually 
reach quiescence, as long as we do not run any threads scheduled by the message handlers. 
Since we invoke message handlers as the network drains, there are no queues of messages to 
consider as part of the root-set during garbage collection. 

Once the network is drained, all processors synchronize and then initiate the mark phase. 
In this phase, all live and reachable objects residing on the memory nodes are marked according 
to one of the object identification techniques starting from the distributed tree of activation 
frames residing on the computation nodes. This process requires global communication among 
processors to mark objects distributed across the machine. After global marking is completed 
on all nodes, the processors synchronize again and then each memory node begins a local sweep 
phase. A final synchronization is performed after sweeping is completed on all nodes, and then 
the Id threads are allowed to resume computation on the computation nodes. 


Type-Reconstructed Garbage Collection 


The mark phase of the Type-Reconstructed Garbage Collection (TRGC) follows the compiler- 
directed object identification scheme described earlier. Currently, we have only implemented 
the interpreted marking scheme with full type reconstruction as described in Section 8.4.1. 

During the mark phase, the frame memory of each computation node is traversed locally 
to find the activation frames that belong to the current dynamic activation tree. Each activa- 
tion frame that is currently in use is type-reconstructed according to the algorithm shown in 
Figure 7.5. Since the dynamic activation tree is distributed across processors, this process may 
require sending messages to non-local parent activation frames in order to obtain their use-type 
instantiations. 

Once a frame is reconstructed, its slots are searched for heap objects to be marked using their 
fully reconstructed types. We directly follow the type-code interpretation scheme of Figure 8.4 


"The run-time system message handlers are still allowed to send reply messages, but the number of such 
messages is fixed. 
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by examining the type constructor for the current frame-slot to see if it refers to a structured 
datatype. If so, the value in the frame-slot is parsed as a pointer and a request for marking the 
corresponding object is sent to its home node along with its fully reconstructed type packed 
within the requesting message. At the home node, the object and its contents are marked 
according to the marking schema shown in Figure 8.3. 

Note that, although type reconstruction of a frame must precede marking within that frame, 
it may be overlapped with type reconstruction or marking of other frames or heap objects. 


Conservative Garbage Collection 


The mark phase of the Conservative Garbage Collection (CGC) requires no source type infor- 
mation. Conservative garbage collectors use a simple, conservative test to determine whether 
a value in a frame or a heap object is a pointer to another object. Since pointers are identified 
conservatively, CGC may assume that there are live references to an object when there are none, 
therefore some objects may remain uncollected. Also, CGC cannot compact or copy all objects 
because conservatively identified pointers cannot be updated. However, there are some more 
sophisticated schemes that allow compaction and/or copying of a large fraction of the heap 
objects [Bar88]. Finally, CGC has no knowledge of the source types, therefore it must examine 
every slot of every reachable object and no short-circuiting based on scalar-type information is 
possible. 

As in the case of TRGC, the mark phase of CGC begins on the computation nodes by 
traversing their frame memory and identifying the activation frames currently in use. For each 
activation frame in use, we apply the conservative pointer test on each of its frame-slots as 
follows: 


1. First, we check to see if the 64-bit value contained within the frame slot is non-zero and 
is aligned to a 64-bit boundary. If not, then the value is a scalar. 


2. Next, we parse the value as a potential global pointer and determine its home node. If 
the node address falls outside the known range of addresses for memory nodes, the value 
is a scalar. 


3. Finally, we send a message to the home node to check if the value is a valid pointer. At 
the home node, we test whether the value points within the allocated heap area and that 
it points to the head of an actual heap object. The latter test is made possible because 
the run-time system marks the head of each allocated object with a special presence-bit 
pattern. Furthermore, the system guarantees that actual pointers never point to the 
interior of objects. Therefore, this test may be carried out by simply checking for the 
special presence-bit pattern at the head of the pointer value. If this test succeeds then 
the value is considered to be an actual pointer and the object is marked, otherwise the 
value is taken to be a scalar. 


The test may mark some objects that are not actually reachable because a value in memory 
happens to look like a pointer to that object. However, the test is guaranteed to mark only 
actual heap objects because it checks for the special allocation presence-bit pattern. 

Once a value has been determined to be a pointer, the fields of the object it points to are 
scanned for potential references to other objects in a similar fashion. 


171 


Compiler-Directed Storage Reclamation 


For comparison purposes, we have also implemented the explicit, compiler-directed storage 
reclamation scheme (CDSR) within the same compiler and run-time system framework. In 
this scheme, no separate garbage collection needs to be performed: the compiler inserts code 
to deallocate an object when it can determine the object to be garbage. This analysis has a 
substantial compile-time cost compared to the other two storage management schemes. Also, 
the static analysis may not be able to reclaim all the garbage that is generated by the program. 

The run-time costs of this scheme may be divided into a small synchronization cost that 
schedules the deallocation of an object when all its references are dead and the actual cost of 
deallocating the object. The former cost is negligible and is also hard to separate from the user 
program because it is built into the microthread partitioning and synchronization of the user 
program. The second cost is the same as the basic cost of sweeping the unused objects as in the 
other garbage collection schemes and therefore forms the basis of our comparison with those 
schemes. 

We use the CDSR scheme to compare its relative storage management efficiency to that 
of the garbage collected schemes. It is also possible to simultaneously use the explicit storage 
management scheme to get most of the large objects along with a garbage collector that catches 
the smaller, harder to analyze objects. We believe that a mixed approach may yield better 
performance than either scheme on its own. 


8.6 Performance Results and Analysis 


We are interested in two aspects of the performance of the type-reconstructed garbage collection 
(TRGC): how long it takes to garbage collect, and how much garbage it reclaims. We compared 
several programs running with TRGC, conservative garbage collection (CGC), and compiler- 
directed storage reclamation (CDSR). 

In preparing a uniform execution platform, we naturally had to accommodate the require- 
ments of each storage management scheme within the same run-time system. This resulted in 
a system that was not tuned to any particular storage management scheme. For instance, a 
copying or compacting garbage collector could not be used for TRGC since our simple-minded 
scheme for conservative garbage collection would not work in that setup. Similarly, the run- 
time system had to maintain free-lists for reclaimed objects since we wanted to perform explicit 
storage management within the same framework. 

Thus, the results we obtained cannot be treated as an absolute measure of performance for 
any particular scheme. On the other hand, they provide a good measure of relative performance 
of the object identification mechanisms studied and also characterize systems where more than 
one storage management strategy is used. 


8.6.1 Benchmark Runs 


We used four different benchmarks. Quicksort is the standard recursive algorithm for sorting 
N list elements parameterized by a polymorphic comparison predicate. Paraffins generates and 
counts the number of distinct paraffin isomers of up to N carbon atoms. Gamteb is a Monte 
Carlo simulation of N photons impinging on a carbon rod divided into two cells. Finally, 
Wavefront consists of 10 iterations of a successive over-relaxation kernel of a N x N matrix 
containing floating-point data. 
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Quicksort Instruction Cycles (x 1000) 


Mode mew Heap cc | Id Id RTS Idle | Total 
(Wds) Basic Mark Sweep TREC Total 


5628 
5640 
5236 


Paraffins Instruction Cycles (x1000) 


Mode mew Heap cc | Id RTS Idle | Total 
(Wds) Basic Mark Sweep TREC Total 


TRGC 
CGC 
CDSR 


CGC 
CDSR 
TRGC 
CGC 
CDSR 
TRGC 
CGC 
CDSR 


Figure 8.7: Performance Results for Quicksort and Paraffins. 


For each of the programs we tested, we ran three versions: TRGC, CGC, and CDSR. The 
TRGC version is the program running with type-reconstructing garbage collection. The CGC 
version is running with conservative garbage collection, and the CDSR is the automatically 
annotated version running with no garbage collection. Both garbage collectors used the mark 
and sweep algorithm, and used the same implementation of sweeping and inter-processor syn- 
chronization. Using a simple GC algorithm allowed us to separate the basic heap management 
cost (allocation and deallocation) from the overall cost of garbage collection. Thus, the cost 
of object traversal and marking of TRGC and CGC can be truly ascribed to their respective 
object identification strategies. 

In all three cases, actual heap storage management and statistics collection was performed by 
the same Id run-time system. Although statistics gathering was mildly intrusive, it constituted 
a tiny fraction of total cycles executed. Online statistics processing (re-sampling profiles) was 
not counted. 
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Gamteb Instruction Cycles (x 1000) 


Mode mew Heap cc | Id Id RTS Idle | Total 
(Wds) Basic Mark Sweep TREC Total 


5634 2394 
5634 2425 
1780 2371 


4668 


4682 
4671 
6594 
6604 
6714 
8624 
8636 
8775 


Wavefront Instruction Cycles (x1000) 
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Basic Mark Sweep TREC Total 
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Figure 8.8: Performance Results for Gamteb and Wavefront. 


We simulated several problem sizes on a single processor with each program and storage 
management scheme. Figure 8.7 and Figure 8.8 show the performance results for each of the 
benchmarks. The first two columns identify the storage management scheme (Mode) and the 
input size (N). The next two columns show the maximum heap size used (Heap) during each run 
measured in 32-bit words, and the number of garbage collections performed (GCs). Subsequent 
columns record timing information for various categories of instructions measured in Kcycles. 
In each of the garbage collected runs, the run-time system initiated the garbage collection when 
the currently allocated space exceeded 75% of the total heap space. Garbage collection was 
switched off for CDSR runs. 

The timing information for each benchmark run is broken up into several categories. The 
amount of time spent in Id computation threads (Id) includes basic computation work, math- 
library subroutine calls, split-phase memory referencing and program I/O. The time spent in 
the run-time system (Id RTS) is classified into the time spent in basic storage management 
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Figure 8.9: Total Cost and Run-time System Cost for the Benchmarks. 


(allocation/deallocation), frame and object marking during garbage collections, object sweep- 
ing, and type reconstruction. The remaining time is spent idling through the scheduling loop 
waiting for messages to arrive through the network.® 


8.6.2 Performance Analysis 
Time Analysis 


The total instruction cycles and the cycles spent in the run-time system (including garbage 
collection) for all the runs are summarized in Figure 8.9. These curves give an idea of the 
growth of run-time system cost of the various schemes as a function of problem size and as a 
fraction of the total cost. 

Several trends are apparent from Figure 8.9. The CDSR scheme consistently has the lowest 
run-time cost since it does not perform any garbage collection and only incurs the basic heap 
and frame management cost (allocation and deallocation). The fraction of time spent in the 


’Even if only a single processor is used out of a multi-processor *T configuration, all messages are sent out 


to the network and received after some delay. This may cause idle cycles on the processor if it does not have 
anything else to do. 
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Figure 8.10: Run-time System Cost Breakup. 


run-time system varies widely depending upon the nature of the application and the cost and 
the number of garbage collections performed. For example, Paraffins allocates a lot of small- 
sized data-structures keeping them live until the very end. Thus, each mark phase has to do a 
lot of work. Similarly, Quicksort rapidly unfolds into a tree of activation frames each of which 
holds onto a substantial amount of storage, so the cost of marking is high there as well. On 
the other hand, for Gamteb, the size of the live heap is quite small so the garbage collected 
schemes incur very little overhead. 

Comparing the relative run-time costs of TRGC and the CGC, we find that for Quicksort 
and Paraffins, TRGC does worse than CGC, while for Wavefront TRGC performs better. This 
wide variation can be explained by examining the run-time cost breakup shown in Figure 8.10 
for the largest sized runs. We split the basic storage management cost shown in Figure 8.7 and 
Figure 8.8 between the cost of managing the frame area and the cost of managing the heap. 
The marking cost is similarly split between the cost of marking the frames and the cost of 
marking the live heap objects. 

Looking at Figure 8.10, TRGC spends a significant amount of time in the type reconstruc- 
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tion phase for both Quicksort and Paraffins. This is because both these benchmarks contain 
several polymorphic functions. Thus, the type reconstruction mechanism has to generate and 
propagate the exact run-time type instantiation down from the root to each polymorphic frame 
in the dynamic call tree. On the other hand, the type reconstruction cost is hardly visible in 
Gamteb and Wavefront that are not polymorphic and largely consist of first-order functions. 
Furthermore, during type reconstruction and interpreted marking, the run-time types are rep- 
resented as C data-structures and are currently managed using conventional malloc and free 
system calls. This cost can be substantially reduced by using a specialized version of malloc. 

The marking cost of TRGC is also about 1.5-2.2 times higher than that of CGC in case 
of Quicksort of 100 elements and Paraffins of 13 carbon atoms. Our current implementation 
interprets the type structures at run-time in order to traverse and mark the corresponding run- 
time objects. This interpretation overhead could be eliminated by using the compiled marking 
schema as described in Section 8.4.2 where the compiler generates a specialized marking routine 
for each source type parameterized over its polymorphic variables. Furthermore, these routines 
can be inlined to produce highly optimized traversal and marking functions for each user-defined 
function activation frame. 

In the case of Wavefront, TRGC takes much less time than CGC, and very little more 
time in total than CDSR, where no marking at all took place. For Wavefront of 40 x 40, the 
marking cost of CGC is 25 times higher than that of TRGC. TRGC did so well because it 
could determine that the arrays contained only scalar data by inspecting their run-time type. 
Therefore, it only marked the arrays themselves and did not scan for pointers inside them, as 
CGC did. This scanning cost depends on the total size of the arrays and was responsible for 
the quadratic growth in run-time cost for CGC as shown in Figure 8.9. However, sweeping took 
the same amount of time for both TRGC and CGC. 

The wavefront example shows that in an ideal situation, the time to mark the heap for TRGC 
is proportional to the total number of live object references, rather than the total amount of live 
storage as it is for CGC. TRGC can use the reconstructed type information to avoid scanning 
elements of scalar arrays and scalar fields within records and algebraic types. 


Space Analysis 


In terms of space usage, both TRGC and CGC perform identically. As shown in Figure 8.7 and 
Figure 8.8, both TRGC and CGC perform the same number of garbage collections in all runs 
and use roughly the same amount of heap storage. Both TRGC and CGC runs were provided 
with the same amount of initial storage. Although, the size of the initial storage was kept 
sufficiently large to avoid thrashing. This accounts for the small number of garbage collections 
performed. 

Each garbage collected run also performed a final GC at the end of the run to reclaim all the 
uncollected garbage. Due to this final garbage collection, the TRGC and CGC runs actually 
reclaimed more storage than the CDSR runs, because the compiler could not insert deallocation 
commands for all of the temporary storage. 

CGC is able to reclaim all the garbage because of our restrictive compilation model and 
support from the run-time system. As mentioned earlier, in our system all actual pointers 
directly point to the head of a heap object. This not only reduces the overhead of guessing 
whether a given value is a valid heap pointer or not but also avoids creating many more am- 
biguous pointers for the garbage collector to check for. The run-time system further eliminates 
the chances of making the wrong guess by marking the head of every object with a special 
bit-pattern. 
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The performance of CDSR varies with the application. For Gamteb and Wavefront, CDSR 
is able to insert deallocation commands to reclaim all the garbage automatically. Therefore, 
these benchmarks are able to run under CDSR without leaking any storage. The garbage 
collected versions for these benchmarks had to be given 2-10 times the storage used by CDSR 
to avoid thrashing. On the other hand, for Paraffins and Quicksort, CDSR is able to reclaim 
only 10-20% of the total garbage, therefore the TRGC and CGC versions are able to run in 
same or less storage than the CDSR version without thrashing. This shows that in general, 
CDSR may need additional storage reclamation support from an independent garbage collector, 
although it works very efficiently for applications where data-structures are easily analyzed. 


8.7 Conclusions 


In this chapter, we have described a direct application of complete run-time type reconstruction, 
namely, tagless garbage collection (TRGC). We used the reconstruction algorithm described in 
Chapter 7 to reconstruct the exact types of all run-time objects. We also described an inter- 
preted and a compiled marking schema for traversing and marking live run-time objects using 
the reconstructed type information. We have implemented the interpreted marking schema on 
a simulator for the *T architecture and compared its performance with conservative garbage 
collection (CGC) and compiler-directed storage reclamation (CDSR) on several benchmarks. 

Our results show that in general, TRGC does more work in marking the live objects than 
CGC, unless it can avoid scanning large, scalar, array-like objects using type information. 
The type reconstruction overhead increases with the amount of polymorphism and _ higher- 
order functions (closures) used in the program, although the cost of reconstruction is small 
compared to the cost of marking live objects with type interpretation. The cost of interpreted 
marking itself should get reduced considerably using the compiled marking schema instead of 
type interpretation. 

TRGC has the additional advantage that other storage reclamation schemes may be used, 
such as compaction or copying. These may not be used with CGC because they require updating 
live pointers, and CGC cannot guarantee that what it uses as a pointer is not really a scalar 
value. On the other hand, TRGC requires initialization of polymorphic and pointer data with 
valid values and cannot cope with stale data as CGC can. 

CDSR consistently does better than either of the garbage collection schemes in terms of 
time spent in the run-time system. This is as expected, although sometimes it is not able to 
collect all the garbage and therefore requires more memory than strictly necessary. CDSR also 
takes much longer to compile, sometimes increasing compile-time by a factor of 10. 

On the whole, type reconstruction and type-reconstruction-based garbage collection seem 
to be a promising area of research with a lot of scope for compiler optimization and run-time 
performance improvement. This initial study has shown that type reconstruction based garbage 
collection is certainly feasible and can be competitive with other storage management strategies 
under the right mix of applications. 


8.7.1 Future Work 


There are several dimensions in which further investigation would be useful. The first step would 
be to implement the compiled marking schema and compare its performance with our current 
interpreted marking schema. We expect to see a substantial improvement in performance using 
specialized marking functions. Our experience also shows that mixed storage management 
schemes that combine garbage collection with explicit storage reclamation within the same 
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run-time environment are feasible and may be able to combine the benefits of both schemes 
running on its own. 

Although our system has been designed and implemented for a multi-processor architecture, 
we have currently made a study for only a single processor. We would like to see how TRGC 
scales under a multi-processor environment and quantify the inter-processor communication 
overhead for type reconstruction. 

It would be very interesting to compare the performance of TRGC with an explicitly tagged 
object identification scheme implemented within the same framework. It would be interesting 
to know if TRGC offers any concrete advantages over that technique. 

Finally, it would be useful to implement a compacting garbage collector based on type 
reconstruction with a very simple allocation scheme (bumping a pointer) and compare its heap 
management overhead with that of the CGC and CDSR that require a more sophisticated 
storage management scheme (free-lists). 
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